Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) a...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 37; no. 3; pp. 2486 - 2494
Main Authors Yu, Kun-Ming, Zhou, Jiayi
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 15.03.2010
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2009.07.072

Cover

Loading…
Abstract The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree ( TPFP-tree) and Balanced Tidset-based Parallel FP-tree ( BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set ( Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.
AbstractList The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree ( TPFP-tree) and Balanced Tidset-based Parallel FP-tree ( BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set ( Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.
The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (TPFP-tree) and Balanced Tidset-based Parallel FP-tree (BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.
Author Yu, Kun-Ming
Zhou, Jiayi
Author_xml – sequence: 1
  givenname: Kun-Ming
  surname: Yu
  fullname: Yu, Kun-Ming
  email: yu@chu.edu.tw
  organization: Department of Computer Science and Information Engineering, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC
– sequence: 2
  givenname: Jiayi
  surname: Zhou
  fullname: Zhou, Jiayi
  email: jyzhou@pdlab.csie.chu.edu.tw
  organization: Institute of Engineering and Science, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC
BookMark eNp9kE1LxDAQhoOs4K76Bzzl5K1rpqmNBS-yfoKgB70a0mS6ZkmTNUkV_70t68mDMDCHeZ5h5l2QmQ8eCTkBtgQG9dlmielLLUvGmiUTY5V7ZA4Xghe1aPiMzFlzLooKRHVAFiltGAPBmJiTt2cVlXPo6MvDddGqhIZ2ET8G9JluVc4YPe2tt35NlVuHaPN7T4Onij6v6MoNaSSo8oauozVUh3475AlO3-OkPyL7nXIJj3_7IXm9vXlZ3RePT3cPq6vHQnMOuRBao8aqq1QnONasangltClLEJ3WHAzvsG2qRiFCbdqa67rTRpyXAKYVoPghOd3t3cYw3p6y7G3S6JzyGIYkeQ21qABG8GIH6hhSithJbbPKNvgclXUSmJwClRs5BSqnQCUTY5WjWv5Rt9H2Kn7_L13uJBy__7QYZdIWvUZjI-osTbD_6T-BjZNd
CitedBy_id crossref_primary_10_1016_j_im_2015_02_004
crossref_primary_10_1186_s40537_018_0129_4
crossref_primary_10_1109_TKDE_2016_2515622
crossref_primary_10_1016_j_ins_2018_08_009
crossref_primary_10_1080_17445760_2014_927470
crossref_primary_10_1109_TSMC_2015_2437327
crossref_primary_10_1007_s11227_015_1566_x
crossref_primary_10_1016_j_eswa_2019_112874
crossref_primary_10_1016_j_eswa_2011_11_095
crossref_primary_10_1016_j_jksuci_2020_04_008
crossref_primary_10_1016_j_eswa_2011_08_018
crossref_primary_10_1016_j_jpdc_2020_05_017
crossref_primary_10_1016_j_eswa_2011_01_107
crossref_primary_10_1016_j_eswa_2021_116435
crossref_primary_10_1016_j_datak_2019_101721
crossref_primary_10_1016_j_ins_2021_08_070
crossref_primary_10_1016_j_knosys_2013_04_004
crossref_primary_10_1080_02522667_2017_1372143
crossref_primary_10_1007_s10586_017_1249_x
crossref_primary_10_1016_j_eswa_2023_121321
crossref_primary_10_1007_s10586_017_1609_6
crossref_primary_10_1016_j_knosys_2014_02_002
ContentType Journal Article
Copyright 2009 Elsevier Ltd
Copyright_xml – notice: 2009 Elsevier Ltd
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.eswa.2009.07.072
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
EndPage 2494
ExternalDocumentID 10_1016_j_eswa_2009_07_072
S0957417409007386
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
29G
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AAAKG
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABKBG
ABMAC
ABMVD
ABUCO
ABXDB
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNNM
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HAMUX
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SET
SEW
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
WUQ
XPP
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFPUW
AFXIZ
AGCQF
AGRNS
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
CITATION
SSH
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c331t-7ccece4f4af73e6049347cd2217fcc31d3feb949aee16db63c6fcd75211db71a3
IEDL.DBID .~1
ISSN 0957-4174
IngestDate Fri Jul 11 16:22:34 EDT 2025
Thu Apr 24 23:04:06 EDT 2025
Tue Jul 01 03:12:03 EDT 2025
Fri Feb 23 02:30:19 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Frequent pattern mining
Grid computing
Association rules
Data mining
Cluster computing
Language English
License https://www.elsevier.com/tdm/userlicense/1.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c331t-7ccece4f4af73e6049347cd2217fcc31d3feb949aee16db63c6fcd75211db71a3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 36167411
PQPubID 23500
PageCount 9
ParticipantIDs proquest_miscellaneous_36167411
crossref_citationtrail_10_1016_j_eswa_2009_07_072
crossref_primary_10_1016_j_eswa_2009_07_072
elsevier_sciencedirect_doi_10_1016_j_eswa_2009_07_072
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2010-03-15
PublicationDateYYYYMMDD 2010-03-15
PublicationDate_xml – month: 03
  year: 2010
  text: 2010-03-15
  day: 15
PublicationDecade 2010
PublicationTitle Expert systems with applications
PublicationYear 2010
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References (pp. 576–581).
(pp. 18–21).
Gorodetsky, V., Karasaeyv, O., & Samoilov, V. (2003). Multi-agent technology for distributed data mining and classification. In
Lin, Hong, Lu (bib17) 2009; 36
Cannataro, Talia, Trunfio (bib3) 2002; 18
Almaden. Quest synthetic data generation code.
Park, J. S., Chen, M.-S., Yu, P. S. (1995). An effective hash-based algorithm for mining association rules. In
Coenen, Leng, Ahmed (bib6) 2004; 16
Hong, Lin, Wu (bib11) 2008; 34
Yan, Zhang, Zhang (bib21) 2009; 36
Javed, Khokhar (bib12) 2004; 16
(pp. 18–28).
(pp. 487–499).
Pramudiono, Kitsuregawa (bib19) 2003; 2
Jiang, W.-S., & Yu, J.-H. (2005). Distributed data mining on the grid. In
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. In
Zhou, J., & Yu, K.-M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In
Lazcorreta, Botella, Fernández-Caballero (bib14) 2008; 35
(pp. 432–436).
Lin, C.-R., Lee, C.-H., Chen, M.-S., & Yu, P. S. (2002). Distributed data mining in a chain store database of short transactions. In
.
(pp. 438–441).
Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. In
Foster, Kesselman (bib7) 1998
Holt, J. D., & Chung, S. M. (2004). Parallel mining of association rules from text databases on a cluster of workstations. In
Han, Pei, Yin, Mao (bib9) 2004; 8
(pp. 86).
Chen, Huang, Chen, Wu (bib4) 2005; 28
Ciglaric, Pancur, Ster, Dobnikar (bib5) 2005
Tang, P., & Turkia, M. P. (2006). Parallelizing frequent itemset mining with FP-Trees. In
(pp. 30–35).
(pp. 175–186).
References_xml – year: 1998
  ident: bib7
  article-title: The grid: blueprint for a new computing infrastructure
– volume: 2
  start-page: 43
  year: 2003
  end-page: 46
  ident: bib19
  article-title: Shared nothing parallel execution of FP-growth
  publication-title: DBSJ Letters
– volume: 36
  start-page: 9498
  year: 2009
  end-page: 9505
  ident: bib17
  article-title: The Pre-FUFP algorithm for incremental mining
– reference: (pp. 86).
– reference: (pp. 18–21).
– volume: 35
  start-page: 1422
  year: 2008
  end-page: 1429
  ident: bib14
  article-title: Towards personalized recommendation by two-step modified Apriori data mining algorithm
  publication-title: Expert System with applications
– volume: 34
  start-page: 2424
  year: 2008
  end-page: 2435
  ident: bib11
  article-title: Incrementally fast updated frequent pattern trees next term
  publication-title: Expert System with Applications
– reference: Jiang, W.-S., & Yu, J.-H. (2005). Distributed data mining on the grid. In
– reference: Tang, P., & Turkia, M. P. (2006). Parallelizing frequent itemset mining with FP-Trees. In
– reference: (pp. 438–441).
– reference: (pp. 576–581).
– start-page: 522
  year: 2005
  end-page: 525
  ident: bib5
  article-title: Data mining in grid environment
  publication-title: Adaptive and Natural Computing Algorithm
– volume: 16
  start-page: 774
  year: 2004
  end-page: 778
  ident: bib6
  article-title: Data structure for association rule mining: T-trees and P-trees
  publication-title: IEEE Transactions on Knowledge and Data Engineering
– reference: Gorodetsky, V., Karasaeyv, O., & Samoilov, V. (2003). Multi-agent technology for distributed data mining and classification. In
– reference: (pp. 432–436).
– reference: (pp. 175–186).
– reference: Lin, C.-R., Lee, C.-H., Chen, M.-S., & Yu, P. S. (2002). Distributed data mining in a chain store database of short transactions. In
– reference: Zhou, J., & Yu, K.-M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In
– volume: 18
  start-page: 1101
  year: 2002
  end-page: 1112
  ident: bib3
  article-title: Distributed data mining on the grid
  publication-title: Future Generation Computer Systems
– reference: Park, J. S., Chen, M.-S., Yu, P. S. (1995). An effective hash-based algorithm for mining association rules. In
– reference: (pp. 18–28).
– reference: Holt, J. D., & Chung, S. M. (2004). Parallel mining of association rules from text databases on a cluster of workstations. In
– reference: Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. In
– reference: Almaden. Quest synthetic data generation code.
– volume: 8
  start-page: 53
  year: 2004
  end-page: 87
  ident: bib9
  article-title: Mining frequent patterns without candidate generation: A frequent-pattern tree approach
  publication-title: Journal of Data Mining and Knowledge Discovery
– reference: Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. In
– reference: (pp. 487–499).
– volume: 28
  start-page: 453
  year: 2005
  end-page: 460
  ident: bib4
  article-title: Aggregation of orders in distribution centers using data mining
  publication-title: Expert System with Applications
– volume: 16
  start-page: 321
  year: 2004
  end-page: 334
  ident: bib12
  article-title: Frequent pattern mining on message passing multiprocessor systems
  publication-title: Distributed and Parallel Database
– volume: 36
  start-page: 3066
  year: 2009
  end-page: 3076
  ident: bib21
  article-title: Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support
– reference: .
– reference: . (pp. 30–35).
SSID ssj0017007
Score 2.1682494
Snippet The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules,...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2486
SubjectTerms Association rules
Cluster computing
Data mining
Frequent pattern mining
Grid computing
Title Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system
URI https://dx.doi.org/10.1016/j.eswa.2009.07.072
https://www.proquest.com/docview/36167411
Volume 37
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA9DL178Fj9nDt6kblnSpD2OqWyKQ1DBkyXNx5xs3eg6vPm3m9emgoIehBxKea8NL8n7IO_9HkJnhJpOm4Y8CG1sXYDCwiCNI_ekNXMxrubtFOqd74a8_8RunsPnBurVtTCQVul1f6XTS23t37S8NFvz8bj14JwDZw6FC1DguikC2G3GBODnX3x8pXkA_Jyo8PZEANS-cKbK8TKLd-kxK4Ubnd-M0w81Xdqe60207p1G3K3mtYUaJttGG3VDBuzP5w56uZc59EaZ4MfBZQAGSmObl8nSBZ6XSJoZnpYtIbCcjGb5uHid4lmGJb7v4d5kCagJWGYaj_Kxxqr8ARBXeM-76On66rHXD3wDhUBRSopAKGWUYZZJK6jhLhigTCjdcWGIVYoSTa1JYxZLYwjXKaeKW6WFs-hEp4JIuodWsllm9hG2AB1HROS8C8qokpFmKjLWpik3goedA0RqySXKo4tDk4tJUqeRvSUgbWh7GSdt4YbjOf_imVfYGn9Sh_WCJN92SOKU_598p_XqJe7owH2IzMxsuUgohxIMQg7_-eUjtFblEtCAhMdopciX5sS5KEXaLPdgE612B7f94SeAfOa3
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA4-DnrxLb7NwZuU3WzSpD3Kquz6WARX8GRI89CVtbvULv59J20qKOhByKGUTBuSyTyYmW8QOiHUdto05lHsUgcOCoujLE3gyRgGPq7h7czXO98OeO-BXT3Gj3Oo29TC-LTKIPtrmV5J6_CmFXazNR2NWvdgHIA6FOCg-HBTwufRokenAmZfPOtf9wZfwQTRrqumYX7kCULtTJ3mZd8_VICtFDA6v-mnH5K6Uj-Xa2gl2I34rF7aOpqz-QZabXoy4HBFN9HTnSp8e5QxHvbPI6-jDHZFlS9d4mkFppnjt6orBFbj50kxKl_e8CTHCt91cXc888AJWOUGPxcjg3X1Az-5hnzeQg-XF8NuLwo9FCJNKSkjobXVljmmnKCWgz9AmdCmA56I05oSQ53NUpYqawk3GaeaO20EKHViMkEU3UYL-SS3Owg7jx5HRAIGBmVUq8QwnVjnsoxbwePOLiLNzkkdAMZ9n4uxbDLJXqXfbd_5MpVtAQNoTr9opjW8xp-z4-ZA5DcmkSD__6Q7bk5Pwu3xIRGV28nsXVLuqzAI2fvnl4_RUm94eyNv-oPrfbRcpxbQiMQHaKEsZvYQLJYyOwoc-QlsJOlo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+TID-based+frequent+pattern+mining+algorithm+on+a+PC+Cluster+and+grid+computing+system&rft.jtitle=Expert+systems+with+applications&rft.au=Yu%2C+Kun-Ming&rft.au=Zhou%2C+Jiayi&rft.date=2010-03-15&rft.issn=0957-4174&rft.volume=37&rft.issue=3&rft.spage=2486&rft.epage=2494&rft_id=info:doi/10.1016%2Fj.eswa.2009.07.072&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2009_07_072
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon