Parallel Bulk-Loading of Spatial Data with MapReduce: An R-tree Case

Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapRed...

Full description

Saved in:
Bibliographic Details
Published inWuhan University journal of natural sciences Vol. 16; no. 6; pp. 513 - 519
Main Authors Liu, Yi, Jing, Ning, Chen, Luo, Chen, Huizhong
Format Journal Article
LanguageEnglish
Published Heidelberg Wuhan University 01.12.2011
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each parti- tion. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the gener- ated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism.
Bibliography:parallel bulk-loading; MapReduce; R-tree; queryprocessing
42-1405/N
Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each parti- tion. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the gener- ated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism.
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:1007-1202
1993-4998
DOI:10.1007/s11859-011-0790-3