CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 278; pp. 559 - 576
Main Authors Shin, Se Jung, Lee, Dae Su, Lee, Won Suk
Format Journal Article
LanguageEnglish
Published Elsevier Inc 10.09.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2014.03.074