Association rule mining with modified apriori algorithm using top down approach

Data Mining is a field of computer science that is concerned with extracting useful information from varied sources. In an era where information has become the inherent necessity of human beings, its increased relevance and usefulness has taken focus as need of the hour. The most important part of t...

Full description

Saved in:
Bibliographic Details
Published in2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) pp. 747 - 752
Main Author Shah, Ashish
Format Conference Proceeding
LanguageEnglish
Published IEEE 2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data Mining is a field of computer science that is concerned with extracting useful information from varied sources. In an era where information has become the inherent necessity of human beings, its increased relevance and usefulness has taken focus as need of the hour. The most important part of this association rule mining is the mining of item sets that are frequent. Market basket analysis is done by companies in order to retrieve itemsets that are frequent and often used together by customers. Apriori algorithm is a widely used technique in order to find those combinations of itemsets. However, when any of these frequent itemsets increases in length, the algorithm needs to pass through many iterations and, as a result, the performance drastically decreases. In this paper, we propose a modification to the apriori algorithm by using a hash function which divides the frequent item sets into buckets. Further, we propose a novel technique to be used in conjunction with the apriori algorithm by eliminating infrequent itemsets from the candidate set. In this top down approach, it finds the frequent itemsets without going through several iterations, thus saving time and space. By discovering a large maximal frequent itemset very early in the algorithm, all its subsets are also frequent hence we no longer need to scan them. Clearly, the proposed technique has an advantage over the existing apriori algorithm when the most frequent itemset's length is long.
DOI:10.1109/ICATCCT.2016.7912099