The Parallel Improved Apriori Algorithm Research Based on Spark

Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm, then improves it by constructing a new data structure and optimizing the prepruning step. Based on the improved Apriori algorithm and combine...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Frontier of Computer Science and Technology (Print) pp. 354 - 359
Main Authors Yang, Shaosong, Xu, Guoyan, Wang, Zhijian, Zhou, Fachao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2015
Subjects
Online AccessGet full text
ISSN2159-6301
DOI10.1109/FCST.2015.28

Cover

Loading…
More Information
Summary:Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm, then improves it by constructing a new data structure and optimizing the prepruning step. Based on the improved Apriori algorithm and combined with the Spark support for fine-grained data processing, we elaborate the idea of the improved Apriori algorithm's parallel processing, and propose the SIAP algorithms. We experimented by comparing with the Apriori algorithms based on Hadoop and the Apriori algorithms based on Spark, and the results show that the SIAP algorithm has a higher efficiency.
ISSN:2159-6301
DOI:10.1109/FCST.2015.28