CP-tree: An adaptive synopsis structure for compressing frequent itemsets over online data streams

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a...

Full description

Saved in:

Bibliographic Details
Published in	Information sciences Vol. 278; pp. 559 - 576
Main Authors	Shin, Se Jung, Lee, Dae Su, Lee, Won Suk
Format	Journal Article
Language	English
Published	Elsevier Inc 10.09.2014
Subjects	Data mining Data stream Frequent itemset Frequent itemset compression Stream data mining Data stream Data mining Stream data mining Frequent itemset compression Frequent itemset
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2014.03.074