From sequential pattern mining to structured pattern mining: A pattern-growth approach

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequen...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 19; no. 3; pp. 257 - 279
Main Authors Han, Jia-Wei, Pei, Jian, Yan, Xi-Feng
Format Journal Article
LanguageEnglish
Published Beijing Springer Nature B.V 01.05.2004
University of Illinois at Urbana-Champaign, Urbana, IL 61801, U.S.A.%State University of New York at Buffalo, Buffalo, NY 14260-2000, U.S.A
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) acandidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) apattern-growth method, represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns. In this study, we perform a systematic introduction and presentation of the pattern-growth methodology and study its principles and extensions. We first introduce two interesting pattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then we introduce gSpan for mining structured patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including mining multi-level, multi-dimensional patterns and mining constraint-based patterns.[PUBLICATION ABSTRACT]
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1000-9000
1860-4749
DOI:10.1007/BF02944897