Parallel Bulk-Loading of Spatial Data with MapReduce： An R-tree Case

Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapRed...

Full description

Saved in:

Bibliographic Details
Published in	Wuhan University journal of natural sciences Vol. 16; no. 6; pp. 513 - 519
Main Authors	Liu, Yi, Jing, Ning, Chen, Luo, Chen, Huizhong
Format	Journal Article
Language	English
Published	Heidelberg Wuhan University 01.12.2011
Subjects	Biomedical and Life Sciences Computer Science Construction Hilbert curve Life Sciences Materials Science Partitions Random sampling query processing TP 391 R-tree parallel bulk-loading MapReduce
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each parti- tion. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the gener- ated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism.
Bibliography:	parallel bulk-loading; MapReduce; R-tree; queryprocessing 42-1405/N Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each parti- tion. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the gener- ated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism. ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	1007-1202 1993-4998
DOI:	10.1007/s11859-011-0790-3