Integrated method for distributed processing of large XML data

The traditional standalone computing approach is difficult to handle the task of processing large XML data due to scalability, thus distributed processing using cluster systems becomes an inevitable choice. The currently distributed XML processing methods generally rely on existing distributed compu...

Full description

Saved in:
Bibliographic Details
Published inCluster computing Vol. 27; no. 2; pp. 1375 - 1399
Main Authors Chen, Rongxin, Cai, Guorong, Chen, Jie, Hong, Yuling
Format Journal Article
LanguageEnglish
Published New York Springer US 01.04.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The traditional standalone computing approach is difficult to handle the task of processing large XML data due to scalability, thus distributed processing using cluster systems becomes an inevitable choice. The currently distributed XML processing methods generally rely on existing distributed computing frameworks for general purpose data, which have limitations such as complex configuration, inflexible working mechanism, and difficult performance optimization in the context of XML semi-structural features and complex queries. In addition, XML distributed queries suffer from a low level of automatic processing and lack of effective integration with distributed XML parsing and indexing. In this paper we propose an integrated method for distributed processing of large XML data, called the dXML method. Our method supports the distributed parsing of arbitrary XML fragment and the distributed creation of index, and adopts the efficient navigational XPath evaluation based on relation index. Through a distributed XPath evaluation approach based on filter-upon-pre-evaluate, our method enables data locality and reduces network traffic during the distributed evaluation of complex XPath predicates. dXML integrates the distributed processing technology of XML parsing, index creation and XPath query, provides a one-stop XML processing solution, supports the automatic distributed processing of large XML data, and has the characteristics of lightweight configuration and flexible working mechanism. Experimental evaluation verifies the effectiveness of dXML, and comparative experimental results show that dXML has better distributed query performance than both the typical existing navigational and Twig distributed processing methods.
ISSN:1386-7857
1573-7543
DOI:10.1007/s10586-023-04010-0