Efficient mining of frequent XML query patterns with repeating-siblings

A recent approach to improve the performance of XML query evaluation is to cache the query results of frequent query patterns. Unfortunately, discovering these frequent query patterns is an expensive operation. In this paper, we develop a two-pass mining algorithm 2 PXMiner that guarantees the disco...

Full description

Saved in:

Bibliographic Details
Published in	Information and software technology Vol. 50; no. 5; pp. 375 - 389
Main Authors	Yang, Liang Huai, Lee, Mong Li, Hsu, Wynne, Huang, Decai, Wong, Limsoon
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.04.2008 Elsevier Science Ltd
Subjects	Algorithms Data mining Extensible Markup Language Frequent pattern mining Structured pattern Studies Systems design Tree pattern mining XML query pattern Frequent pattern mining Structured pattern XML query pattern Tree pattern mining
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A recent approach to improve the performance of XML query evaluation is to cache the query results of frequent query patterns. Unfortunately, discovering these frequent query patterns is an expensive operation. In this paper, we develop a two-pass mining algorithm 2 PXMiner that guarantees the discovery of frequent query patterns by scanning the database at most twice. By exploiting a transaction summary data structure, and an enumeration tree, we are able to determine the upper bounds of the frequencies of the candidate patterns, and to quickly prune away the infrequent patterns. We also design an index to trace the repeating candidate subtrees generated by sibling repetition, thus avoiding redundant computations. Experiments results indicate that 2 PXMiner is both efficient and scalable.
ISSN:	0950-5849 1873-6025
DOI:	10.1016/j.infsof.2007.02.019