Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) a...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 37; no. 3; pp. 2486 - 2494
Main Authors	Yu, Kun-Ming, Zhou, Jiayi
Format	Journal Article
Language	English
Published	Elsevier Ltd 15.03.2010
Subjects	Association rules Cluster computing Data mining Frequent pattern mining Grid computing Frequent pattern mining Grid computing Association rules Data mining Cluster computing
Online Access	Get full text
ISSN	0957-4174 1873-6793
DOI	10.1016/j.eswa.2009.07.072

Cover

Loading…

Abstract	The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree ( TPFP-tree) and Balanced Tidset-based Parallel FP-tree ( BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set ( Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.
AbstractList	The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree ( TPFP-tree) and Balanced Tidset-based Parallel FP-tree ( BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set ( Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid. The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (TPFP-tree) and Balanced Tidset-based Parallel FP-tree (BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.
Author	Yu, Kun-Ming Zhou, Jiayi
Author_xml	– sequence: 1 givenname: Kun-Ming surname: Yu fullname: Yu, Kun-Ming email: yu@chu.edu.tw organization: Department of Computer Science and Information Engineering, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC – sequence: 2 givenname: Jiayi surname: Zhou fullname: Zhou, Jiayi email: jyzhou@pdlab.csie.chu.edu.tw organization: Institute of Engineering and Science, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC
BookMark	eNp9kE1LxDAQhoOs4K76Bzzl5K1rpqmNBS-yfoKgB70a0mS6ZkmTNUkV_70t68mDMDCHeZ5h5l2QmQ8eCTkBtgQG9dlmielLLUvGmiUTY5V7ZA4Xghe1aPiMzFlzLooKRHVAFiltGAPBmJiTt2cVlXPo6MvDddGqhIZ2ET8G9JluVc4YPe2tt35NlVuHaPN7T4Onij6v6MoNaSSo8oauozVUh3475AlO3-OkPyL7nXIJj3_7IXm9vXlZ3RePT3cPq6vHQnMOuRBao8aqq1QnONasangltClLEJ3WHAzvsG2qRiFCbdqa67rTRpyXAKYVoPghOd3t3cYw3p6y7G3S6JzyGIYkeQ21qABG8GIH6hhSithJbbPKNvgclXUSmJwClRs5BSqnQCUTY5WjWv5Rt9H2Kn7_L13uJBy__7QYZdIWvUZjI-osTbD_6T-BjZNd
CitedBy_id	crossref_primary_10_1016_j_im_2015_02_004 crossref_primary_10_1186_s40537_018_0129_4 crossref_primary_10_1109_TKDE_2016_2515622 crossref_primary_10_1016_j_ins_2018_08_009 crossref_primary_10_1080_17445760_2014_927470 crossref_primary_10_1109_TSMC_2015_2437327 crossref_primary_10_1007_s11227_015_1566_x crossref_primary_10_1016_j_eswa_2019_112874 crossref_primary_10_1016_j_eswa_2011_11_095 crossref_primary_10_1016_j_jksuci_2020_04_008 crossref_primary_10_1016_j_eswa_2011_08_018 crossref_primary_10_1016_j_jpdc_2020_05_017 crossref_primary_10_1016_j_eswa_2011_01_107 crossref_primary_10_1016_j_eswa_2021_116435 crossref_primary_10_1016_j_datak_2019_101721 crossref_primary_10_1016_j_ins_2021_08_070 crossref_primary_10_1016_j_knosys_2013_04_004 crossref_primary_10_1080_02522667_2017_1372143 crossref_primary_10_1007_s10586_017_1249_x crossref_primary_10_1016_j_eswa_2023_121321 crossref_primary_10_1007_s10586_017_1609_6 crossref_primary_10_1016_j_knosys_2014_02_002
ContentType	Journal Article
Copyright	2009 Elsevier Ltd
Copyright_xml	– notice: 2009 Elsevier Ltd
DBID	AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D
DOI	10.1016/j.eswa.2009.07.072
DatabaseName	CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1873-6793
EndPage	2494
ExternalDocumentID	10_1016_j_eswa_2009_07_072 S0957417409007386
GroupedDBID	--K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 29G 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AAAKG AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO AAYFN ABBOA ABFNM ABKBG ABMAC ABMVD ABUCO ABXDB ABYKQ ACDAQ ACGFS ACHRH ACNNM ACNTT ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HAMUX HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SDS SES SET SEW SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 WUQ XPP ZMT ~G- AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFPUW AFXIZ AGCQF AGRNS AIIUN AKBMS AKRWK AKYEP ANKPU CITATION SSH 7SC 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c331t-7ccece4f4af73e6049347cd2217fcc31d3feb949aee16db63c6fcd75211db71a3
IEDL.DBID	.~1
ISSN	0957-4174
IngestDate	Fri Jul 11 16:22:34 EDT 2025 Thu Apr 24 23:04:06 EDT 2025 Tue Jul 01 03:12:03 EDT 2025 Fri Feb 23 02:30:19 EST 2024
IsPeerReviewed	true
IsScholarly	true
Issue	3
Keywords	Frequent pattern mining Grid computing Association rules Data mining Cluster computing
Language	English
License	https://www.elsevier.com/tdm/userlicense/1.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c331t-7ccece4f4af73e6049347cd2217fcc31d3feb949aee16db63c6fcd75211db71a3
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
PQID	36167411
PQPubID	23500
PageCount	9
ParticipantIDs	proquest_miscellaneous_36167411 crossref_citationtrail_10_1016_j_eswa_2009_07_072 crossref_primary_10_1016_j_eswa_2009_07_072 elsevier_sciencedirect_doi_10_1016_j_eswa_2009_07_072
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2010-03-15
PublicationDateYYYYMMDD	2010-03-15
PublicationDate_xml	– month: 03 year: 2010 text: 2010-03-15 day: 15
PublicationDecade	2010
PublicationTitle	Expert systems with applications
PublicationYear	2010
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	(pp. 576–581). (pp. 18–21). Gorodetsky, V., Karasaeyv, O., & Samoilov, V. (2003). Multi-agent technology for distributed data mining and classification. In Lin, Hong, Lu (bib17) 2009; 36 Cannataro, Talia, Trunfio (bib3) 2002; 18 Almaden. Quest synthetic data generation code. Park, J. S., Chen, M.-S., Yu, P. S. (1995). An effective hash-based algorithm for mining association rules. In Coenen, Leng, Ahmed (bib6) 2004; 16 Hong, Lin, Wu (bib11) 2008; 34 Yan, Zhang, Zhang (bib21) 2009; 36 Javed, Khokhar (bib12) 2004; 16 (pp. 18–28). (pp. 487–499). Pramudiono, Kitsuregawa (bib19) 2003; 2 Jiang, W.-S., & Yu, J.-H. (2005). Distributed data mining on the grid. In Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. In Zhou, J., & Yu, K.-M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In Lazcorreta, Botella, Fernández-Caballero (bib14) 2008; 35 (pp. 432–436). Lin, C.-R., Lee, C.-H., Chen, M.-S., & Yu, P. S. (2002). Distributed data mining in a chain store database of short transactions. In . (pp. 438–441). Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. In Foster, Kesselman (bib7) 1998 Holt, J. D., & Chung, S. M. (2004). Parallel mining of association rules from text databases on a cluster of workstations. In Han, Pei, Yin, Mao (bib9) 2004; 8 (pp. 86). Chen, Huang, Chen, Wu (bib4) 2005; 28 Ciglaric, Pancur, Ster, Dobnikar (bib5) 2005 Tang, P., & Turkia, M. P. (2006). Parallelizing frequent itemset mining with FP-Trees. In (pp. 30–35). (pp. 175–186).
References_xml	– year: 1998 ident: bib7 article-title: The grid: blueprint for a new computing infrastructure – volume: 2 start-page: 43 year: 2003 end-page: 46 ident: bib19 article-title: Shared nothing parallel execution of FP-growth publication-title: DBSJ Letters – volume: 36 start-page: 9498 year: 2009 end-page: 9505 ident: bib17 article-title: The Pre-FUFP algorithm for incremental mining – reference: (pp. 86). – reference: (pp. 18–21). – volume: 35 start-page: 1422 year: 2008 end-page: 1429 ident: bib14 article-title: Towards personalized recommendation by two-step modified Apriori data mining algorithm publication-title: Expert System with applications – volume: 34 start-page: 2424 year: 2008 end-page: 2435 ident: bib11 article-title: Incrementally fast updated frequent pattern trees next term publication-title: Expert System with Applications – reference: Jiang, W.-S., & Yu, J.-H. (2005). Distributed data mining on the grid. In – reference: Tang, P., & Turkia, M. P. (2006). Parallelizing frequent itemset mining with FP-Trees. In – reference: (pp. 438–441). – reference: (pp. 576–581). – start-page: 522 year: 2005 end-page: 525 ident: bib5 article-title: Data mining in grid environment publication-title: Adaptive and Natural Computing Algorithm – volume: 16 start-page: 774 year: 2004 end-page: 778 ident: bib6 article-title: Data structure for association rule mining: T-trees and P-trees publication-title: IEEE Transactions on Knowledge and Data Engineering – reference: Gorodetsky, V., Karasaeyv, O., & Samoilov, V. (2003). Multi-agent technology for distributed data mining and classification. In – reference: (pp. 432–436). – reference: (pp. 175–186). – reference: Lin, C.-R., Lee, C.-H., Chen, M.-S., & Yu, P. S. (2002). Distributed data mining in a chain store database of short transactions. In – reference: Zhou, J., & Yu, K.-M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In – volume: 18 start-page: 1101 year: 2002 end-page: 1112 ident: bib3 article-title: Distributed data mining on the grid publication-title: Future Generation Computer Systems – reference: Park, J. S., Chen, M.-S., Yu, P. S. (1995). An effective hash-based algorithm for mining association rules. In – reference: (pp. 18–28). – reference: Holt, J. D., & Chung, S. M. (2004). Parallel mining of association rules from text databases on a cluster of workstations. In – reference: Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. In – reference: Almaden. Quest synthetic data generation code. – volume: 8 start-page: 53 year: 2004 end-page: 87 ident: bib9 article-title: Mining frequent patterns without candidate generation: A frequent-pattern tree approach publication-title: Journal of Data Mining and Knowledge Discovery – reference: Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. In – reference: (pp. 487–499). – volume: 28 start-page: 453 year: 2005 end-page: 460 ident: bib4 article-title: Aggregation of orders in distribution centers using data mining publication-title: Expert System with Applications – volume: 16 start-page: 321 year: 2004 end-page: 334 ident: bib12 article-title: Frequent pattern mining on message passing multiprocessor systems publication-title: Distributed and Parallel Database – volume: 36 start-page: 3066 year: 2009 end-page: 3076 ident: bib21 article-title: Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support – reference: . – reference: . (pp. 30–35).
SSID	ssj0017007
Score	2.1682494
Snippet	The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules,...
SourceID	proquest crossref elsevier
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	2486
SubjectTerms	Association rules Cluster computing Data mining Frequent pattern mining Grid computing
Title	Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system
URI	https://dx.doi.org/10.1016/j.eswa.2009.07.072 https://www.proquest.com/docview/36167411
Volume	37
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA9DL178Fj9nDt6kblnSpD2OqWyKQ1DBkyXNx5xs3eg6vPm3m9emgoIehBxKea8NL8n7IO_9HkJnhJpOm4Y8CG1sXYDCwiCNI_ekNXMxrubtFOqd74a8_8RunsPnBurVtTCQVul1f6XTS23t37S8NFvz8bj14JwDZw6FC1DguikC2G3GBODnX3x8pXkA_Jyo8PZEANS-cKbK8TKLd-kxK4Ubnd-M0w81Xdqe60207p1G3K3mtYUaJttGG3VDBuzP5w56uZc59EaZ4MfBZQAGSmObl8nSBZ6XSJoZnpYtIbCcjGb5uHid4lmGJb7v4d5kCagJWGYaj_Kxxqr8ARBXeM-76On66rHXD3wDhUBRSopAKGWUYZZJK6jhLhigTCjdcWGIVYoSTa1JYxZLYwjXKaeKW6WFs-hEp4JIuodWsllm9hG2AB1HROS8C8qokpFmKjLWpik3goedA0RqySXKo4tDk4tJUqeRvSUgbWh7GSdt4YbjOf_imVfYGn9Sh_WCJN92SOKU_598p_XqJe7owH2IzMxsuUgohxIMQg7_-eUjtFblEtCAhMdopciX5sS5KEXaLPdgE612B7f94SeAfOa3
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA4-DnrxLb7NwZuU3WzSpD3Kquz6WARX8GRI89CVtbvULv59J20qKOhByKGUTBuSyTyYmW8QOiHUdto05lHsUgcOCoujLE3gyRgGPq7h7czXO98OeO-BXT3Gj3Oo29TC-LTKIPtrmV5J6_CmFXazNR2NWvdgHIA6FOCg-HBTwufRokenAmZfPOtf9wZfwQTRrqumYX7kCULtTJ3mZd8_VICtFDA6v-mnH5K6Uj-Xa2gl2I34rF7aOpqz-QZabXoy4HBFN9HTnSp8e5QxHvbPI6-jDHZFlS9d4mkFppnjt6orBFbj50kxKl_e8CTHCt91cXc888AJWOUGPxcjg3X1Az-5hnzeQg-XF8NuLwo9FCJNKSkjobXVljmmnKCWgz9AmdCmA56I05oSQ53NUpYqawk3GaeaO20EKHViMkEU3UYL-SS3Owg7jx5HRAIGBmVUq8QwnVjnsoxbwePOLiLNzkkdAMZ9n4uxbDLJXqXfbd_5MpVtAQNoTr9opjW8xp-z4-ZA5DcmkSD__6Q7bk5Pwu3xIRGV28nsXVLuqzAI2fvnl4_RUm94eyNv-oPrfbRcpxbQiMQHaKEsZvYQLJYyOwoc-QlsJOlo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+TID-based+frequent+pattern+mining+algorithm+on+a+PC+Cluster+and+grid+computing+system&rft.jtitle=Expert+systems+with+applications&rft.au=Yu%2C+Kun-Ming&rft.au=Zhou%2C+Jiayi&rft.date=2010-03-15&rft.issn=0957-4174&rft.volume=37&rft.issue=3&rft.spage=2486&rft.epage=2494&rft_id=info:doi/10.1016%2Fj.eswa.2009.07.072&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2009_07_072
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon