Parallel Bulk-Loading of Spatial Data with MapReduce: An R-tree Case
Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapRed...
Saved in:
Published in | Wuhan University journal of natural sciences Vol. 16; no. 6; pp. 513 - 519 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Heidelberg
Wuhan University
01.12.2011
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each parti- tion. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the gener- ated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism. |
---|---|
Bibliography: | parallel bulk-loading; MapReduce; R-tree; queryprocessing 42-1405/N Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each parti- tion. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the gener- ated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism. ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
ISSN: | 1007-1202 1993-4998 |
DOI: | 10.1007/s11859-011-0790-3 |