Fast Frequent Free Tree Mining in Graph Databases

Free tree, as a special undirected, acyclic and connected graph, is extensively used in computational biology, pattern recognition, computer networks, XML databases, etc. In this paper, we present a computationally efficient algorithm F3TM (Fast Frequent Free Tree Mining) to find all frequently-occu...

Full description

Saved in:
Bibliographic Details
Published inWorld wide web (Bussum) Vol. 11; no. 1; pp. 71 - 92
Main Authors Zhao, Peixiang, Yu, Jeffrey Xu
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.03.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Free tree, as a special undirected, acyclic and connected graph, is extensively used in computational biology, pattern recognition, computer networks, XML databases, etc. In this paper, we present a computationally efficient algorithm F3TM (Fast Frequent Free Tree Mining) to find all frequently-occurred free trees in a graph database, . Two key steps of F3TM are candidate generation and frequency counting. The frequency counting step is to compute how many graphs in containing a candidate frequent free tree, which is proved to be the subgraph isomorphism problem in nature and is NP-complete. Therefore, the key issue becomes how to reduce the number of false positives in the candidate generation step. Based on our observations, the cost of false positive reduction can be prohibitive itself. In this paper, we focus ourselves on how to reduce the candidate generation cost and minimize the number of infrequent candidates being generated. We prove a theorem that the complete set of frequent free trees can be discovered from a graph database by growing vertices on a limited range of positions of a free tree. We propose two pruning algorithms, namely, automorphism-based pruning and canonical mapping-based pruning, which significantly reduce the candidate generation cost. We conducted extensive experimental studies using a real application dataset and a synthetic dataset. The experiment results show that our algorithm F3TM outperforms the up-to-date algorithms by an order of magnitude in mining frequent free trees in large graph databases.
ISSN:1386-145X
1573-1413
DOI:10.1007/s11280-007-0031-z