The Parallel Improved Apriori Algorithm Research Based on Spark

Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm, then improves it by constructing a new data structure and optimizing the prepruning step. Based on the improved Apriori algorithm and combine...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Frontier of Computer Science and Technology (Print) pp. 354 - 359
Main Authors Yang, Shaosong, Xu, Guoyan, Wang, Zhijian, Zhou, Fachao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2015
Subjects
Online AccessGet full text
ISSN2159-6301
DOI10.1109/FCST.2015.28

Cover

Abstract Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm, then improves it by constructing a new data structure and optimizing the prepruning step. Based on the improved Apriori algorithm and combined with the Spark support for fine-grained data processing, we elaborate the idea of the improved Apriori algorithm's parallel processing, and propose the SIAP algorithms. We experimented by comparing with the Apriori algorithms based on Hadoop and the Apriori algorithms based on Spark, and the results show that the SIAP algorithm has a higher efficiency.
AbstractList Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm, then improves it by constructing a new data structure and optimizing the prepruning step. Based on the improved Apriori algorithm and combined with the Spark support for fine-grained data processing, we elaborate the idea of the improved Apriori algorithm's parallel processing, and propose the SIAP algorithms. We experimented by comparing with the Apriori algorithms based on Hadoop and the Apriori algorithms based on Spark, and the results show that the SIAP algorithm has a higher efficiency.
Author Zhou, Fachao
Wang, Zhijian
Xu, Guoyan
Yang, Shaosong
Author_xml – sequence: 1
  givenname: Shaosong
  surname: Yang
  fullname: Yang, Shaosong
  email: 489271346@qq.com
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 2
  givenname: Guoyan
  surname: Xu
  fullname: Xu, Guoyan
  email: gy_xu@126.com
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 3
  givenname: Zhijian
  surname: Wang
  fullname: Wang, Zhijian
  email: zhjwang@hhu.edu.cn
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
– sequence: 4
  givenname: Fachao
  surname: Zhou
  fullname: Zhou, Fachao
  email: 790428547@qq.com
  organization: Coll. of Comput. & Inf., Hohai Univ., Nanjing, China
BookMark eNotjs1KxDAURiOM4Dh2585NXqD13qRJ2pXUwdGBAcWp6yFJb22xf6SD4Ntb0NVZfIePc81WwzgQY7cICSLk97vtsUwEoEpEdsGi3GSYaiNzkSuxYmuBKo-1BLxi0Ty3DoQ2WkFq1uyhbIi_2WC7jjq-76cwflPFiym0Y2h50X0uODc9f6eZbPANf7TzIowDP042fN2wy9p2M0X_3LCP3VO5fYkPr8_7bXGIWwHZOdZobO6gIm9rJPIVoHcpogdtlAVla4faO4nCE7jKZN4pU8k6g2UQSHLD7v5-WyI6LXW9DT8nIzE1oOQvAeJLHA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/FCST.2015.28
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781467392952
1467392952
1467392944
9781467392945
EndPage 359
ExternalDocumentID 7314705
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i208t-617a9b0decaf1eecd01cb411c0675a05afb16cb312ce0bd78cb57d3f80fb121e3
IEDL.DBID RIE
ISSN 2159-6301
IngestDate Wed Aug 27 02:12:25 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-617a9b0decaf1eecd01cb411c0675a05afb16cb312ce0bd78cb57d3f80fb121e3
PageCount 6
ParticipantIDs ieee_primary_7314705
PublicationCentury 2000
PublicationDate 20150801
PublicationDateYYYYMMDD 2015-08-01
PublicationDate_xml – month: 08
  year: 2015
  text: 20150801
  day: 01
PublicationDecade 2010
PublicationTitle International Conference on Frontier of Computer Science and Technology (Print)
PublicationTitleAbbrev FCST
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026765047
ssj0003177943
Score 1.6848164
Snippet Apriori algorithm is one of the classical algorithm in the association rule mining field, this paper analyzes the shortcomings of classical Apriori algorithm,...
SourceID ieee
SourceType Publisher
StartPage 354
SubjectTerms Algorithm design and analysis
Apriori
association rule
Clustering algorithms
Data mining
efficiency
Heuristic algorithms
Itemsets
parallel
Spark
Sparks
Title The Parallel Improved Apriori Algorithm Research Based on Spark
URI https://ieeexplore.ieee.org/document/7314705
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJyZALeJbHhhJaufbEyoVVYVUVKmt1K2yzw5ULUkUpQu_nnPSFoQYmGIliy-55D07790Rch_6RkjlMwcHqRNAlDhKCHCEF6pI4PohDqx3ePwajebByyJctMjDwQtjjKnFZ8a1w_pfvs5ha7fKerHPg9gWLD3CNGu8Wvvc8aIYucaOGtuvMOJi3IjmENSEE2EiH3TvojccTGdW1xW6tg37j74qNawMT8h4P6FGTbJ2t5Vy4fNXrcb_zviUdL8NfHRygKYz0jJZhzxiStCJLG33lA1tdhOMpv2iXOXlivY3b3io3j_oXo1HnxDiNM0zOi1kue6S-fB5Nhg5u_4JzspjSWXdf1Iopg3IlBsDmnFQAedgVwmShTJVPALlcw8MUzpOQIWx9tOE4QWPG_-ctLM8MxeE6iQFAHx7QSokUMgiFMYqg9Qg4UlZckk6Nvhl0ZTIWO7ivvr79DU5tve-0dHdkHZVbs0tYnul7uqH-gW58aHX
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8JAEN0QPehJDRi_3YNHW3b7td2TQSJBBUICJNzIflUJ2JKmXPz1zraAxnjw1E172clO-95u35tB6C70DRfSJw4MEidQUexIzpXDvVBGHPYPLLDe4f4g6k6Cl2k4raH7nRfGGFOKz4xrh-W_fJ2ptT0qazKfBswWLN0H3A_Cyq21zR4vYsA2NuTYfocBGVklmwNY404EqbxTvvNmpz0aW2VX6NpG7D86q5TA0jlC_e2UKj3Jwl0X0lWfv6o1_nfOx6jxbeHDwx04naCaSevoAZICD0Vu-6cscXWeYDRurfJ5ls9xa_kGl-L9A2_1ePgRQE7jLMWjlcgXDTTpPI3bXWfTQcGZeyQurP9PcEm0USKhxihNqJIBpcruEwQJRSJppKRPPWWI1CxWMmTaT2ICDzxq_FO0l2apOUNYx4lSCt5fJSRQKOAREmIVQWKA8iQkPkd1G_xsVRXJmG3ivvj79i066I77vVnvefB6iQ7tOlSquiu0V-Rrcw1IX8ibcoG_AGbOpSQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Frontier+of+Computer+Science+and+Technology+%28Print%29&rft.atitle=The+Parallel+Improved+Apriori+Algorithm+Research+Based+on+Spark&rft.au=Yang%2C+Shaosong&rft.au=Xu%2C+Guoyan&rft.au=Wang%2C+Zhijian&rft.au=Zhou%2C+Fachao&rft.date=2015-08-01&rft.pub=IEEE&rft.issn=2159-6301&rft.spage=354&rft.epage=359&rft_id=info:doi/10.1109%2FFCST.2015.28&rft.externalDocID=7314705
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2159-6301&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2159-6301&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2159-6301&client=summon