by Suprio Ray University of Toronto T Space

By Suprio Ray University Of Toronto T Space-PDF Download

  • Date:29 Jul 2020
  • Views:5
  • Downloads:0
  • Pages:148
  • Size:9.74 MB

Share Pdf : By Suprio Ray University Of Toronto T Space

Download and Preview : By Suprio Ray University Of Toronto T Space


Report CopyRight/DMCA Form For : By Suprio Ray University Of Toronto T Space


Transcription:

High Performance Spatial and Spatio temporal Data Processing. Suprio Ray,Doctor of Philosophy,Graduate Department of Computer Science. University of Toronto, The rapid growth of spatial data volume and technological trends in storage capacity and processing power are. fueling many emerging spatial and spatio temporal applications from a wide range of domains Spatial join is. at the heart of many of the emerging spatial data analysis applications However spatial join processing on. even a moderate sized dataset is very time consuming At the same time there is a rapid expansion in available. processing cores through multicore machines and Cloud computing The confluence of these trends points to a. need for effective parallelization of spatial query processing Unfortunately traditional parallel spatial databases. are ill equipped to deal with the performance heterogeneity that is common in the Cloud We introduce a parallel. spatial data analysis infrastructure that exploits all available cores in a heterogeneous cluster We also present an. approach to parallelize spatial join queries in a large main memory multicore machine. Among the emerging spatio temporal applications Location Based Services LBS are the most prominent. These applications are characterized by high rate of updates and many concurrent short running range queries. We present a parallel in memory spatio temporal indexing technique to support the demands of LBS workloads. Our system achieves significantly better performance than existing approaches to indexing spatio temporal data. We also introduce a parallel in memory spatio temporal topological join approach. Acknowledgements, The pursuit of a PhD degree is a significant endeavour in terms of time energy and financial implications In. my journey in this pursuit I have spent many a hours reading research papers writing and testing code running. experiments and writing technical documents But this effort may not have been successful without the support. of a number of people, First of all I would like to thank my adviser Prof Angela Demke Brown for all her support during these past. years Prof Demke Brown has been a wonderful mentor who is one of the smartest persons I have ever met Her. office door was always open for me whenever I needed to discuss something No matter what she would have been. doing she always gave me undivided attention She gave me the freedom to pursue topics that interest me At. the same time she provided valuable advice as to how to remain focused Her tireless support during the writing. of research papers is inspiring I have learnt a lot from her as to how to address and solve a research problem. and how to present it to an audience I express my sincere gratitude to Prof Demke Brown for her support and. guidance that has made this thesis a reality, I owe my deep gratitude to Prof Ryan Johnson and Prof Nick Koudas who are my thesis committee members.
They have been my mentors and collaborators throughout this endeavour During the various checkpoint meetings. they provided me with valuable feedback I could always count on them for insightful suggestions on how to. improve my work, I am grateful to Dr Rolando Blanco and Dr Anil Goel who were my mentors at SAP They allowed me to. pursue a research topic of my choice I received valuable inputs from them during my internship at SAP. I thank Prof Ashvin Goel and Prof Eyal de Lara for agreeing to be the internal examiners They also. generously offered support at different times by letting me use some of their servers My special thanks to the. external examiner Prof Jo rg Sander from University of Alberta for patiently reading my thesis offering valuable. suggestions and writing a kind appraisal report I am grateful to Prof Alexandra Fedorova from Simon Fraser. University who believed in me and inspired me to pursue a PhD. I would like to thank my lab mate Bogdan Simion Bogdan has always been a wonderful friend of mine and I. admire him for his gentle and cheerful character He collaborated with me in the Jackpine project and helped me. by exploring a few macro benchmarks setting up databases and running experiments He also assisted me in the. Niharika project by helping to create the large dataset We had many an interesting discussion about our research. I also thank all the members of SysNet lab I have made many friends and I shall cherish the fond memories. Finally I thank my wife for her support and patience I am indebted to my parents for their love and support. and it is futile to try to thank them I also thank my brother for his support. Last but not least I would like to thank all the people I met and worked with during these years whose direct. and indirect support has made a difference,1 Introduction 1. 1 1 Benchmarking Spatial Queries 2,1 2 Spatial Data Management 2. 1 3 Spatio Temporal Data Management 3,1 4 Organization Of The Dissertation 4. 2 Related Works 5,2 1 Database Benchmarking 5,2 1 1 Non spatial database benchmarks 5.
2 1 2 Spatial Database Benchmarks 6,2 2 Spatial Data Management 7. 2 2 1 Spatial Data Representation 8,2 2 2 Spatial Index 9. 2 2 3 External Memory Spatial Join 10,2 2 4 In Memory Spatial Join 13. 2 2 5 Parallel Spatial Join 13, 2 2 6 Big Data Infrastructures For Spatial Query Processing 16. 2 3 Spatio Temporal Data Management 19,2 3 1 Spatio Temporal Index 19.
2 3 2 Spatio Temporal Join And Trajectory Index 24. 3 Benchmarking Spatial Queries 26,3 1 Introduction 26. 3 2 The Benchmark 27,3 2 1 Implementation Overview 27. 3 2 2 Data Model 29,3 2 3 Micro Benchmark 29,3 2 4 Macro Benchmark 30. 3 3 Experimental Setup 34,3 4 Benchmark Results 35. 3 4 1 Data Loading 35,3 4 2 Micro Benchmark 36,3 4 3 Macro Benchmark 40.
3 4 4 Overall Score 40,3 5 Discussion 41, 4 Performance Heterogeneity Aware Parallel Spatial Join For The Cloud 43. 4 1 Introduction 43, 4 1 1 A Case For Parallelizing Spatial Join Queries 43. 4 1 2 Heterogeneity In Computing Clusters 44,4 1 3 Load Balancing 46. 4 2 Niharika 47,4 2 1 Architecture 47,4 2 2 Spatial Declustering 48. 4 2 3 Load Balancing 48, 4 2 4 Heterogeneity aware single assignment of partitions 52.
4 2 5 Multi Round Assignment Of Partitions 52,4 3 Experimental Evaluation 58. 4 3 1 Experimental Setup 58,4 3 2 Results With Dataset That Fits In Memory 58. 4 3 3 Results With Larger Dataset USA 59,4 4 Scalability and Data Placement 60. 4 5 Chapter Summary 63,5 Parallel In Memory Spatial Join 64. 5 1 Introduction 64,5 1 1 Processing Skew 64,5 2 SPINOJA 65.
5 2 1 System Organization 66,5 2 2 MOD Quadtree Declustering 67. 5 2 3 Work Metrics 69,5 2 4 Processing Of Spatial Predicates 72. 5 2 5 Load Balancing 73, 5 2 6 Determining The Number Of Partitions Tiles 76. 5 2 7 Managing Processing Skew 80,5 3 Experimental Evaluation 80. 5 3 1 Experimental Setup 80,5 3 2 Multicore Scalability 80.
5 3 3 Comparison With Other In Memory Spatial Join Approaches 81. 5 3 4 Comparison With PostgreSQL 83,5 4 Chapter Summary 84. 6 High Performance Spatio Temporal Index For Location Based Services LBS 85. 6 1 Introduction 85,6 2 Design Considerations 87,6 2 1 Main Memory Storage For LBS 87. 6 2 2 A Novel Parallel In Memory Index 87,6 3 Overall System Organization 88. 6 4 Insert Efficient Main Memory Storage 88,6 5 PASTIS 90. 6 5 1 Index Structure 90,6 5 2 Update Processing 92.
6 5 3 Load Balancing And Skew Handling 93,6 5 4 Query Processing 94. 6 6 Discussion 96,6 7 Performance Studies 97,6 7 1 Experimental Setup 97. 6 7 2 Update Performance 98,6 7 3 Query Performance 101. 6 7 4 Comparison With Other Indexes And Other LBS Systems 103. 6 8 Chapter Summary 103, 7 Trajectory Based Spatio Temporal Topological Join 105. 7 1 Introduction 105,7 2 Case Study And Motivation 106.
7 3 Spatio Temporal Join Queries 107, 7 3 1 Trajectory Based Spatio Temporal Topological Join 107. 7 3 2 Spatio Temporal Topological Predicates 108,7 3 3 Predicates Evaluation 108. 7 4 Applicable Spatio Temporal Join Algorithms 109. 7 4 1 Nested Loop Join NLJ 109, 7 4 2 Indexed Nested Loop Join With One Index INLJ1I 109. 7 4 3 Indexed Nested Loop Join With Two Indexes INLJ2I 109. 7 5 Our Approach 111,7 5 1 Trajectory Filtering INLJ2I TF 111. 7 5 2 Evaluation And Analysis 112,7 5 3 Overview Of PISTON 113.
7 5 4 A Parallel In Memory Join Algorithm 114,7 5 5 In Memory Spatial Index 115. 7 5 6 In Memory Trajectory Index 118,7 6 Experimental Evaluation 120. 7 6 1 Dataset With Static Polygons 120,7 6 2 Dataset With Evolving Polygons 125. 7 7 Chapter Summary 127,8 Conclusion 128,8 0 1 Future work 129. Bibliography 130,List of Tables, 3 1 Database tables used for micro and macro benchmark 28.
3 2 Topological Relations in Dimensionally Extended 9 intersection Model included in benchmark 30. 3 3 Micro benchmark queries 31,3 4 Micro benchmark inserts 32. 3 5 Macro benchmark queries 33,3 6 Databases used in evaluation 35. 3 7 Load time importing shapefiles index creation update stats 36. 3 8 Overall scores 41,4 1 Some use cases of spatial join queries 44. 4 2 Database tables 44, 4 3 Jackpine queries and abbreviations with two datasets 45. 4 4 Illustration of data placement 62,5 1 Jackpine queries and abbreviations 65.
5 2 Point density per tile for Ed cr Ed 80,5 3 Database tables 81. 6 1 Trace file details 101,6 2 Parameter settings 102. 7 1 Queries with different predicates and abbreviations 108. 7 2 Details of trajectory dataset 120,7 3 Trajectory dataset generation settings 121. 7 4 Details of polygon dataset 121, 7 5 Evolving polygons dataset generation settings 125. List of Figures, 2 1 MBR representation and the two step spatial query evaluation 8.
2 2 Spatial partitioning 9,3 1 Benchmark output report 27. 3 2 Pairwise spatial joins involving polygons lines and points 36. 3 3 Pairwise spatial joins involving Polygons Areas only 37. 3 4 Spatial join with a given object 37, 3 5 Spatial analysis N S scenario not supported due to unsupported operations 38. 3 6 Data insertion 39,3 7 Macro Benchmark N S scenario not supported 39. 4 1 Min and max single node PostgreSQL with PostGIS execution times for Jackpine queries ob. served on 10 distinct EC2 m1 xlarge instances warm runs 46. 4 2 Min and max observed execution times of 7 Jackpine queries in a cluster of local nodes cold runs 46. 4 3 Architecture of Niharika 47, 4 4 Spatial partitioning a and declustering using Hilbert SFC traversal and tile aggregation into. partitions b 47,4 5 Spatial join with partitioned tables 50.
4 6 Execution times for LiAus query with different partition strategies 51. 4 7 Query execution times original vs Sharding StoredProc California dataset 52. 4 8 Query execution times original vs Sharding StoredProc USA dataset 52. 4 9 Algorithm1 Multi round fixed size batch assignment 53. 4 10 Algorithm2 Multi round fixed size batch assignment using node processing capacity 55. 4 11 Algorithm3 Multi round batch assignment from a preferred partition set 56. 4 12 Procedure StrategicAssignment for Algorithm3 57. 4 13 Speedup of Algorithm1 compared with Algorithm2 California dataset 4 cores node 58. 4 14 Algorithm3 vs Algorithm1 vs RR ESP speedup USA dataset 4 cores node 59. 4 15 Histogram of the processing times of the partitions for LtAus USA dataset Algorithm3 59. 4 16 Algorithm3 vs Algorithm1 vs RR ESP speedup with modified dataset USA dataset 4 cores node 60. 4 17 Algorithm4 Multi round batch assignment to idle nodes with fetch 61. 4 18 Algorithm4 speedup in a cluster single core 62. 5 1 execution time spent in filter vs refinement with Clone Join IM and INLJ 65. 5 2 Histogram of the execution time per spatial partition tile for query Ed cr Al 66. 5 3 SPINOJA system organization 67,5 4 MOD Quadtree object decomposition 68. 5 5 Procedure getWorkMetric4Tile 68, 5 6 Histogram of total area in hectares per tile of Arealm landmass polygons table 69. 5 7 Histogram of total length in km per tile of Edges polylines table 70. 5 8 The cost of evaluating polygon overlaps polygon 71. 5 9 Execution times of queries with different metrics 2 cores 72. 5 10 Processing of predicate Polygon overlaps polygon 72. 5 11 Processing of predicate Polygon contains polygon 73. 5 12 Algorithm Load balance TileNLJ 75,5 13 Algorithm Load balance TilePS 76. 5 14 Algorithm Load balance TilePS sepRefine 77, 5 15 Execution times of queries with different load balancing strategies 2 cores 78. 5 16 Candidate set size of queries with different load balancing strategies 2 cores 78. 5 17 Break down of time with different load balancing strategies 2 cores 78. 5 18 Cache misses with different load balancing strategies 2 cores 79. 5 19 Execution times of queries with different number of tiles 2 cores 79. 5 20 Candidate set size of queries with different number of tiles 2 cores 79. 5 21 Execution times of different tiles for the query Ed cr Al 79. 5 22 Multicore speedup of SPINOJA over 1 core performance 2 4 6 and 8 cores 81. 5 23 Memory usage for queries with SPINOJA vs other approaches 8 cores 81. 5 24 Execution times of queries with SPINOJA vs other approaches 8 cores 81. 5 25 Breakdown of execution times of queries with SPINOJA vs other approaches sequential execution 82. 5 26 Execution times of queries with SPINOJA vs PostgreSQL 83. Suprio Ray Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2015 The rapid growth of spatial data volume and technological trends in storage capacity and processing power are fueling many emerging spatial and spatio temporal applications from a wide range of domains Spatial join is at the heart of many of the emerging spatial data analysis applications

Related Books