Parallel labeling of massive XML data with MapReduce

The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of supercomputing Vol. 67; no. 2; pp. 408 - 437
Main Authors	Choi, Hyebong, Lee, Kyong-Ha, Lee, Yoon-Joon
Format	Journal Article
Language	English
Published	Boston Springer US 01.02.2014
Subjects	Algorithms Compilers Computer Science Extensible Markup Language Interpreters Labels Marking Processor Architectures Programming Languages Skewness Trees Workload XML Parallel computing XML Tree labeling algorithm MapReduce
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce’s inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-013-1008-6