Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system
The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) a...
Saved in:
Published in | Expert systems with applications Vol. 37; no. 3; pp. 2486 - 2494 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
15.03.2010
|
Subjects | |
Online Access | Get full text |
ISSN | 0957-4174 1873-6793 |
DOI | 10.1016/j.eswa.2009.07.072 |
Cover
Loading…
Abstract | The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (
TPFP-tree) and Balanced Tidset-based Parallel FP-tree (
BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (
Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed
BTP-tree can balance the loading according to the computing ability of the processors.
BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of
TPFP-tree and
BTP-tree. The experimental results showed that the
TPFP-tree needed less execution time on a PC Cluster than the
PFP-tree when the database increased. Moreover, the
BTP-tree shortened the execution time significantly and had a better load balance capability than both the
TPFP-tree and
PFP-tree on a multi-cluster grid. |
---|---|
AbstractList | The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (
TPFP-tree) and Balanced Tidset-based Parallel FP-tree (
BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (
Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed
BTP-tree can balance the loading according to the computing ability of the processors.
BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of
TPFP-tree and
BTP-tree. The experimental results showed that the
TPFP-tree needed less execution time on a PC Cluster than the
PFP-tree when the database increased. Moreover, the
BTP-tree shortened the execution time significantly and had a better load balance capability than both the
TPFP-tree and
PFP-tree on a multi-cluster grid. The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (TPFP-tree) and Balanced Tidset-based Parallel FP-tree (BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid. |
Author | Yu, Kun-Ming Zhou, Jiayi |
Author_xml | – sequence: 1 givenname: Kun-Ming surname: Yu fullname: Yu, Kun-Ming email: yu@chu.edu.tw organization: Department of Computer Science and Information Engineering, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC – sequence: 2 givenname: Jiayi surname: Zhou fullname: Zhou, Jiayi email: jyzhou@pdlab.csie.chu.edu.tw organization: Institute of Engineering and Science, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC |
BookMark | eNp9kE1LxDAQhoOs4K76Bzzl5K1rpqmNBS-yfoKgB70a0mS6ZkmTNUkV_70t68mDMDCHeZ5h5l2QmQ8eCTkBtgQG9dlmielLLUvGmiUTY5V7ZA4Xghe1aPiMzFlzLooKRHVAFiltGAPBmJiTt2cVlXPo6MvDddGqhIZ2ET8G9JluVc4YPe2tt35NlVuHaPN7T4Onij6v6MoNaSSo8oauozVUh3475AlO3-OkPyL7nXIJj3_7IXm9vXlZ3RePT3cPq6vHQnMOuRBao8aqq1QnONasangltClLEJ3WHAzvsG2qRiFCbdqa67rTRpyXAKYVoPghOd3t3cYw3p6y7G3S6JzyGIYkeQ21qABG8GIH6hhSithJbbPKNvgclXUSmJwClRs5BSqnQCUTY5WjWv5Rt9H2Kn7_L13uJBy__7QYZdIWvUZjI-osTbD_6T-BjZNd |
CitedBy_id | crossref_primary_10_1016_j_im_2015_02_004 crossref_primary_10_1186_s40537_018_0129_4 crossref_primary_10_1109_TKDE_2016_2515622 crossref_primary_10_1016_j_ins_2018_08_009 crossref_primary_10_1080_17445760_2014_927470 crossref_primary_10_1109_TSMC_2015_2437327 crossref_primary_10_1007_s11227_015_1566_x crossref_primary_10_1016_j_eswa_2019_112874 crossref_primary_10_1016_j_eswa_2011_11_095 crossref_primary_10_1016_j_jksuci_2020_04_008 crossref_primary_10_1016_j_eswa_2011_08_018 crossref_primary_10_1016_j_jpdc_2020_05_017 crossref_primary_10_1016_j_eswa_2011_01_107 crossref_primary_10_1016_j_eswa_2021_116435 crossref_primary_10_1016_j_datak_2019_101721 crossref_primary_10_1016_j_ins_2021_08_070 crossref_primary_10_1016_j_knosys_2013_04_004 crossref_primary_10_1080_02522667_2017_1372143 crossref_primary_10_1007_s10586_017_1249_x crossref_primary_10_1016_j_eswa_2023_121321 crossref_primary_10_1007_s10586_017_1609_6 crossref_primary_10_1016_j_knosys_2014_02_002 |
ContentType | Journal Article |
Copyright | 2009 Elsevier Ltd |
Copyright_xml | – notice: 2009 Elsevier Ltd |
DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
DOI | 10.1016/j.eswa.2009.07.072 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1873-6793 |
EndPage | 2494 |
ExternalDocumentID | 10_1016_j_eswa_2009_07_072 S0957417409007386 |
GroupedDBID | --K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 29G 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AAAKG AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO AAYFN ABBOA ABFNM ABKBG ABMAC ABMVD ABUCO ABXDB ABYKQ ACDAQ ACGFS ACHRH ACNNM ACNTT ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HAMUX HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SDS SES SET SEW SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 WUQ XPP ZMT ~G- AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFPUW AFXIZ AGCQF AGRNS AIIUN AKBMS AKRWK AKYEP ANKPU CITATION SSH 7SC 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c331t-7ccece4f4af73e6049347cd2217fcc31d3feb949aee16db63c6fcd75211db71a3 |
IEDL.DBID | .~1 |
ISSN | 0957-4174 |
IngestDate | Fri Jul 11 16:22:34 EDT 2025 Thu Apr 24 23:04:06 EDT 2025 Tue Jul 01 03:12:03 EDT 2025 Fri Feb 23 02:30:19 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Keywords | Frequent pattern mining Grid computing Association rules Data mining Cluster computing |
Language | English |
License | https://www.elsevier.com/tdm/userlicense/1.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c331t-7ccece4f4af73e6049347cd2217fcc31d3feb949aee16db63c6fcd75211db71a3 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
PQID | 36167411 |
PQPubID | 23500 |
PageCount | 9 |
ParticipantIDs | proquest_miscellaneous_36167411 crossref_citationtrail_10_1016_j_eswa_2009_07_072 crossref_primary_10_1016_j_eswa_2009_07_072 elsevier_sciencedirect_doi_10_1016_j_eswa_2009_07_072 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2010-03-15 |
PublicationDateYYYYMMDD | 2010-03-15 |
PublicationDate_xml | – month: 03 year: 2010 text: 2010-03-15 day: 15 |
PublicationDecade | 2010 |
PublicationTitle | Expert systems with applications |
PublicationYear | 2010 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | (pp. 576–581). (pp. 18–21). Gorodetsky, V., Karasaeyv, O., & Samoilov, V. (2003). Multi-agent technology for distributed data mining and classification. In Lin, Hong, Lu (bib17) 2009; 36 Cannataro, Talia, Trunfio (bib3) 2002; 18 Almaden. Quest synthetic data generation code. Park, J. S., Chen, M.-S., Yu, P. S. (1995). An effective hash-based algorithm for mining association rules. In Coenen, Leng, Ahmed (bib6) 2004; 16 Hong, Lin, Wu (bib11) 2008; 34 Yan, Zhang, Zhang (bib21) 2009; 36 Javed, Khokhar (bib12) 2004; 16 (pp. 18–28). (pp. 487–499). Pramudiono, Kitsuregawa (bib19) 2003; 2 Jiang, W.-S., & Yu, J.-H. (2005). Distributed data mining on the grid. In Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. In Zhou, J., & Yu, K.-M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In Lazcorreta, Botella, Fernández-Caballero (bib14) 2008; 35 (pp. 432–436). Lin, C.-R., Lee, C.-H., Chen, M.-S., & Yu, P. S. (2002). Distributed data mining in a chain store database of short transactions. In . (pp. 438–441). Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. In Foster, Kesselman (bib7) 1998 Holt, J. D., & Chung, S. M. (2004). Parallel mining of association rules from text databases on a cluster of workstations. In Han, Pei, Yin, Mao (bib9) 2004; 8 (pp. 86). Chen, Huang, Chen, Wu (bib4) 2005; 28 Ciglaric, Pancur, Ster, Dobnikar (bib5) 2005 Tang, P., & Turkia, M. P. (2006). Parallelizing frequent itemset mining with FP-Trees. In (pp. 30–35). (pp. 175–186). |
References_xml | – year: 1998 ident: bib7 article-title: The grid: blueprint for a new computing infrastructure – volume: 2 start-page: 43 year: 2003 end-page: 46 ident: bib19 article-title: Shared nothing parallel execution of FP-growth publication-title: DBSJ Letters – volume: 36 start-page: 9498 year: 2009 end-page: 9505 ident: bib17 article-title: The Pre-FUFP algorithm for incremental mining – reference: (pp. 86). – reference: (pp. 18–21). – volume: 35 start-page: 1422 year: 2008 end-page: 1429 ident: bib14 article-title: Towards personalized recommendation by two-step modified Apriori data mining algorithm publication-title: Expert System with applications – volume: 34 start-page: 2424 year: 2008 end-page: 2435 ident: bib11 article-title: Incrementally fast updated frequent pattern trees next term publication-title: Expert System with Applications – reference: Jiang, W.-S., & Yu, J.-H. (2005). Distributed data mining on the grid. In – reference: Tang, P., & Turkia, M. P. (2006). Parallelizing frequent itemset mining with FP-Trees. In – reference: (pp. 438–441). – reference: (pp. 576–581). – start-page: 522 year: 2005 end-page: 525 ident: bib5 article-title: Data mining in grid environment publication-title: Adaptive and Natural Computing Algorithm – volume: 16 start-page: 774 year: 2004 end-page: 778 ident: bib6 article-title: Data structure for association rule mining: T-trees and P-trees publication-title: IEEE Transactions on Knowledge and Data Engineering – reference: Gorodetsky, V., Karasaeyv, O., & Samoilov, V. (2003). Multi-agent technology for distributed data mining and classification. In – reference: (pp. 432–436). – reference: (pp. 175–186). – reference: Lin, C.-R., Lee, C.-H., Chen, M.-S., & Yu, P. S. (2002). Distributed data mining in a chain store database of short transactions. In – reference: Zhou, J., & Yu, K.-M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In – volume: 18 start-page: 1101 year: 2002 end-page: 1112 ident: bib3 article-title: Distributed data mining on the grid publication-title: Future Generation Computer Systems – reference: Park, J. S., Chen, M.-S., Yu, P. S. (1995). An effective hash-based algorithm for mining association rules. In – reference: (pp. 18–28). – reference: Holt, J. D., & Chung, S. M. (2004). Parallel mining of association rules from text databases on a cluster of workstations. In – reference: Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. In – reference: Almaden. Quest synthetic data generation code. – volume: 8 start-page: 53 year: 2004 end-page: 87 ident: bib9 article-title: Mining frequent patterns without candidate generation: A frequent-pattern tree approach publication-title: Journal of Data Mining and Knowledge Discovery – reference: Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. In – reference: (pp. 487–499). – volume: 28 start-page: 453 year: 2005 end-page: 460 ident: bib4 article-title: Aggregation of orders in distribution centers using data mining publication-title: Expert System with Applications – volume: 16 start-page: 321 year: 2004 end-page: 334 ident: bib12 article-title: Frequent pattern mining on message passing multiprocessor systems publication-title: Distributed and Parallel Database – volume: 36 start-page: 3066 year: 2009 end-page: 3076 ident: bib21 article-title: Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support – reference: . – reference: . (pp. 30–35). |
SSID | ssj0017007 |
Score | 2.1682494 |
Snippet | The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules,... |
SourceID | proquest crossref elsevier |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 2486 |
SubjectTerms | Association rules Cluster computing Data mining Frequent pattern mining Grid computing |
Title | Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system |
URI | https://dx.doi.org/10.1016/j.eswa.2009.07.072 https://www.proquest.com/docview/36167411 |
Volume | 37 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA9DL178Fj9nDt6kblnSpD2OqWyKQ1DBkyXNx5xs3eg6vPm3m9emgoIehBxKea8NL8n7IO_9HkJnhJpOm4Y8CG1sXYDCwiCNI_ekNXMxrubtFOqd74a8_8RunsPnBurVtTCQVul1f6XTS23t37S8NFvz8bj14JwDZw6FC1DguikC2G3GBODnX3x8pXkA_Jyo8PZEANS-cKbK8TKLd-kxK4Ubnd-M0w81Xdqe60207p1G3K3mtYUaJttGG3VDBuzP5w56uZc59EaZ4MfBZQAGSmObl8nSBZ6XSJoZnpYtIbCcjGb5uHid4lmGJb7v4d5kCagJWGYaj_Kxxqr8ARBXeM-76On66rHXD3wDhUBRSopAKGWUYZZJK6jhLhigTCjdcWGIVYoSTa1JYxZLYwjXKaeKW6WFs-hEp4JIuodWsllm9hG2AB1HROS8C8qokpFmKjLWpik3goedA0RqySXKo4tDk4tJUqeRvSUgbWh7GSdt4YbjOf_imVfYGn9Sh_WCJN92SOKU_598p_XqJe7owH2IzMxsuUgohxIMQg7_-eUjtFblEtCAhMdopciX5sS5KEXaLPdgE612B7f94SeAfOa3 |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA4-DnrxLb7NwZuU3WzSpD3Kquz6WARX8GRI89CVtbvULv59J20qKOhByKGUTBuSyTyYmW8QOiHUdto05lHsUgcOCoujLE3gyRgGPq7h7czXO98OeO-BXT3Gj3Oo29TC-LTKIPtrmV5J6_CmFXazNR2NWvdgHIA6FOCg-HBTwufRokenAmZfPOtf9wZfwQTRrqumYX7kCULtTJ3mZd8_VICtFDA6v-mnH5K6Uj-Xa2gl2I34rF7aOpqz-QZabXoy4HBFN9HTnSp8e5QxHvbPI6-jDHZFlS9d4mkFppnjt6orBFbj50kxKl_e8CTHCt91cXc888AJWOUGPxcjg3X1Az-5hnzeQg-XF8NuLwo9FCJNKSkjobXVljmmnKCWgz9AmdCmA56I05oSQ53NUpYqawk3GaeaO20EKHViMkEU3UYL-SS3Owg7jx5HRAIGBmVUq8QwnVjnsoxbwePOLiLNzkkdAMZ9n4uxbDLJXqXfbd_5MpVtAQNoTr9opjW8xp-z4-ZA5DcmkSD__6Q7bk5Pwu3xIRGV28nsXVLuqzAI2fvnl4_RUm94eyNv-oPrfbRcpxbQiMQHaKEsZvYQLJYyOwoc-QlsJOlo |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+TID-based+frequent+pattern+mining+algorithm+on+a+PC+Cluster+and+grid+computing+system&rft.jtitle=Expert+systems+with+applications&rft.au=Yu%2C+Kun-Ming&rft.au=Zhou%2C+Jiayi&rft.date=2010-03-15&rft.issn=0957-4174&rft.volume=37&rft.issue=3&rft.spage=2486&rft.epage=2494&rft_id=info:doi/10.1016%2Fj.eswa.2009.07.072&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2009_07_072 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon |