Index-BitTableFI: An improved algorithm for mining frequent itemsets

Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a BitTable structure. BitTableFI is such a recently proposed efficient BitTable-based algorithm, wh...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge-based systems Vol. 21; no. 6; pp. 507 - 513
Main Authors	Song, Wei, Yang, Bingru, Xu, Zhangyan
Format	Journal Article
Language	English
Published	Elsevier B.V 01.08.2008
Subjects	Association rule BitTable Data mining Frequent itemset Index array Subsume index Index array BitTable Subsume index Data mining Association rule Frequent itemset
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a BitTable structure. BitTableFI is such a recently proposed efficient BitTable-based algorithm, which exploits BitTable both horizontally and vertically. Although making use of efficient bit wise operations, BitTableFI still may suffer from the high cost of candidate generation and test. To address this problem, a new algorithm Index-BitTableFI is proposed. Index-BitTableFI also uses BitTable horizontally and vertically. To make use of BitTable horizontally, index array and the corresponding computing method are proposed. By computing the subsume index, those itemsets that co-occurrence with representative item can be identified quickly by using breadth-first search at one time. Then, for the resulting itemsets generated through the index array, depth-first search strategy is used to generate all other frequent itemsets. Thus, the hybrid search is implemented, and the search space is reduced greatly. The advantages of the proposed methods are as follows. On the one hand, the redundant operations on intersection of tidsets and frequency-checking can be avoided greatly; On the other hand, it is proved that frequent itemsets, including representative item and having the same supports as representative item, can be identified directly by connecting the representative item with all the combinations of items in its subsume index. Thus, the cost for processing this kind of itemsets is lowered, and the efficiency is improved. Experimental results show that the proposed algorithm is efficient especially for dense datasets.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2008.03.011