True Load Balancing for Matricized Tensor Times Khatri-Rao Product
MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory archite...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 32; no. 8; pp. 1974 - 1986 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.08.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes. |
---|---|
AbstractList | MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors’ computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes. MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. In conclusion, parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes. |
Author | Aykanat, Cevdet Acer, Seher Abubaker, Nabil |
Author_xml | – sequence: 1 givenname: Nabil orcidid: 0000-0002-5060-3059 surname: Abubaker fullname: Abubaker, Nabil email: nabil.abubaker@bilkent.edu.tr organization: Department of Computer Engineering, Bilkent University, Ankara, Turkey – sequence: 2 givenname: Seher orcidid: 0000-0003-3951-3930 surname: Acer fullname: Acer, Seher email: sacer@sandia.gov organization: Sandia National Labs, Albuquerque, NM, USA – sequence: 3 givenname: Cevdet orcidid: 0000-0002-4559-1321 surname: Aykanat fullname: Aykanat, Cevdet email: aykanat@cs.bilkent.edu.tr organization: Department of Computer Engineering, Bilkent University, Ankara, Turkey |
BackLink | https://www.osti.gov/servlets/purl/1765777$$D View this record in Osti.gov |
BookMark | eNo9kMlOwzAQQC1UJNrCByAuEZxTvMb2kZZVFFFBOFuuM6Gp2rjYyQG-nkRBnGY082bRm6BR7WtA6JzgGSFYX-er2_cZxZTMGBZMsewIjYkQKqVEsVGXYy5STYk-QZMYtxgTLjAfo3keWkiW3hbJ3O5s7ar6Myl9SF5sEypX_UCR5FDHrpJXe4jJ86ZvpG_WJ6vgi9Y1p-i4tLsIZ39xij7u7_LFY7p8fXha3CxTxzLWpGJdYi45OC2UkpBpKChV1lKuhVurDBeuJJYCLSnIUinlhOYglca4KBh3bIouh70-NpWJrmrAbZyva3CNITITUsoOuhqgQ_BfLcTGbH0b6u4v0x1iQmsqSUeRgXLBxxigNIdQ7W34NgSb3qfpfZrep_nz2c1cDDMVAPzzmjHOCWe_FRtxKA |
CODEN | ITDSEO |
CitedBy_id | crossref_primary_10_1109_TPDS_2021_3128827 crossref_primary_10_1109_TPDS_2023_3288520 |
Cites_doi | 10.1145/2833179.2833183 10.1109/ASONAM.2011.80 10.1007/s10462-020-09916-4 10.1039/c3ay41160e 10.1137/18M1210691 10.1007/s42514-019-00012-w 10.1145/2736277.2741077 10.1109/IPDPS.2015.27 10.1109/IPDPS.2001.925093 10.1109/TSP.2017.2690524 10.1109/TPDS.2020.3012624 10.1137/16M1102744 10.1109/71.780863 10.1109/SC.2018.00022 10.1145/2807591.2807624 10.1093/bioinformatics/btm210 10.1145/2339530.2339583 10.1109/92.748202 10.1109/IPDPS.2016.113 10.1137/060676489 10.1109/TKDE.2008.112 10.1137/S0036144502409019 10.1109/IPDPS.2014.62 10.1137/07070111X 10.1145/1921632.1921636 10.1145/2487575.2487619 10.1145/2507157.2507163 10.1137/080737770 10.1109/TPDS.2018.2841843 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
CorporateAuthor | Sandia National Lab. (SNL-NM), Albuquerque, NM (United States) |
CorporateAuthor_xml | – name: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States) |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D OIOZB OTOTI |
DOI | 10.1109/TPDS.2021.3053836 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional OSTI.GOV - Hybrid OSTI.GOV |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-2183 |
EndPage | 1986 |
ExternalDocumentID | 1765777 10_1109_TPDS_2021_3053836 9334414 |
Genre | orig-research |
GrantInformation_xml | – fundername: Türkiye Bilimsel ve Teknolojik Araştirma Kurumu; Scientific and Technological Research Council of Turkey grantid: EEEAG-116E043 funderid: 10.13039/501100004410 |
GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACIWK AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIC RIE RIG RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D ABPTK OIOZB OTOTI PQEST |
ID | FETCH-LOGICAL-c363t-5bf0474ec95887e69ed228aa2495cb860dcf1a2e2f2e7f888c594e78900dd34c3 |
IEDL.DBID | RIE |
ISSN | 1045-9219 |
IngestDate | Fri May 19 00:37:17 EDT 2023 Thu Oct 10 18:07:10 EDT 2024 Fri Aug 23 00:58:48 EDT 2024 Wed Jun 26 19:26:29 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 8 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c363t-5bf0474ec95887e69ed228aa2495cb860dcf1a2e2f2e7f888c594e78900dd34c3 |
Notes | AC04-94AL85000 USDOE National Nuclear Security Administration (NNSA) SAND-2021-0767J |
ORCID | 0000-0002-5060-3059 0000-0003-3951-3930 0000-0002-4559-1321 |
OpenAccessLink | https://www.osti.gov/servlets/purl/1765777 |
PQID | 2493599271 |
PQPubID | 85437 |
PageCount | 13 |
ParticipantIDs | proquest_journals_2493599271 osti_scitechconnect_1765777 ieee_primary_9334414 crossref_primary_10_1109_TPDS_2021_3053836 |
PublicationCentury | 2000 |
PublicationDate | 2021-08-01 |
PublicationDateYYYYMMDD | 2021-08-01 |
PublicationDate_xml | – month: 08 year: 2021 text: 2021-08-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York – name: United States |
PublicationTitle | IEEE transactions on parallel and distributed systems |
PublicationTitleAbbrev | TPDS |
PublicationYear | 2021 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref37 ref15 ref14 ref31 ref33 ref11 ref10 ref2 ref1 ref17 ref16 çatalyürek (ref35) 2001; 1 ref19 ref18 catalyurek (ref34) 2001 shetty (ref22) 2004 choi (ref30) 2014 ref23 li (ref32) 2016 ref26 ref25 ref20 ref21 ref28 ref29 ref8 ref7 ref9 ref4 ref3 görlitz (ref24) 2008 ref6 ref5 carlson (ref27) 2010; 5 uçar (ref36) 2003 |
References_xml | – ident: ref14 doi: 10.1145/2833179.2833183 – ident: ref6 doi: 10.1109/ASONAM.2011.80 – ident: ref23 doi: 10.1007/s10462-020-09916-4 – year: 2008 ident: ref24 article-title: PINTS: Peer-to-peer infrastructure for tagging systems publication-title: Proc 7th Int Conf Peer-to-Peer Syst contributor: fullname: görlitz – start-page: 1296 year: 2014 ident: ref30 article-title: DFacTo: Distributed factorization of tensors publication-title: Proc 27th Int Conf Neural Inf Process Syst contributor: fullname: choi – ident: ref5 doi: 10.1039/c3ay41160e – ident: ref31 doi: 10.1137/18M1210691 – ident: ref19 doi: 10.1007/s42514-019-00012-w – ident: ref25 doi: 10.1145/2736277.2741077 – ident: ref13 doi: 10.1109/IPDPS.2015.27 – volume: 5 year: 2010 ident: ref27 article-title: Toward an architecture for never-ending language learning publication-title: Proc 24th AAAI Conf Artif Intell contributor: fullname: carlson – year: 2016 ident: ref32 article-title: ParTI!: A parallel tensor infrastructure for data analysis contributor: fullname: li – start-page: 28 year: 2001 ident: ref34 article-title: A hypergraph-partitioning approach for coarse-grain decomposition publication-title: Proc ACM/IEEE Conf Supercomputing contributor: fullname: catalyurek – volume: 1 year: 2001 ident: ref35 article-title: A fine-grain hypergraph model for 2D decomposition of sparse matrices publication-title: Proc 15th Intl Parallel and Distrib Process Symp doi: 10.1109/IPDPS.2001.925093 contributor: fullname: çatalyürek – ident: ref7 doi: 10.1109/TSP.2017.2690524 – ident: ref12 doi: 10.1109/TPDS.2020.3012624 – ident: ref10 doi: 10.1137/16M1102744 – ident: ref18 doi: 10.1109/71.780863 – ident: ref4 doi: 10.1109/TSP.2017.2690524 – ident: ref15 doi: 10.1109/SC.2018.00022 – ident: ref9 doi: 10.1145/2807591.2807624 – ident: ref1 doi: 10.1093/bioinformatics/btm210 – ident: ref28 doi: 10.1145/2339530.2339583 – ident: ref20 doi: 10.1109/92.748202 – ident: ref17 doi: 10.1109/IPDPS.2016.113 – ident: ref29 doi: 10.1137/060676489 – ident: ref3 doi: 10.1109/TKDE.2008.112 – ident: ref21 doi: 10.1137/S0036144502409019 – start-page: 926 year: 2003 ident: ref36 article-title: Minimizing communication cost in fine-grain partitioning of sparse matrices publication-title: Proc Int Symp Comput Inf Sci contributor: fullname: uçar – ident: ref37 doi: 10.1109/IPDPS.2014.62 – year: 2004 ident: ref22 article-title: The enron Email dataset database schema and brief statistical report contributor: fullname: shetty – ident: ref16 doi: 10.1137/07070111X – ident: ref8 doi: 10.1145/1921632.1921636 – ident: ref2 doi: 10.1145/2487575.2487619 – ident: ref26 doi: 10.1145/2507157.2507163 – ident: ref33 doi: 10.1137/080737770 – ident: ref11 doi: 10.1109/TPDS.2018.2841843 |
SSID | ssj0014504 |
Score | 2.394259 |
Snippet | MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF)... |
SourceID | osti proquest crossref ieee |
SourceType | Open Access Repository Aggregation Database Publisher |
StartPage | 1974 |
SubjectTerms | Algorithms Computational efficiency Computational modeling Computer architecture Computing costs CP Decomposition Distributed memory Fine-grain hypergraph partitioning Fragmentation Load Load balancing Load modeling Mathematical analysis MATHEMATICS AND COMPUTING Microprocessors MTTKRP Partitioning Partitioning algorithms Processors Program processors Sparse matrices Sparse tensors Tensors |
Title | True Load Balancing for Matricized Tensor Times Khatri-Rao Product |
URI | https://ieeexplore.ieee.org/document/9334414 https://www.proquest.com/docview/2493599271 https://www.osti.gov/servlets/purl/1765777 |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4Bp_YAFFp1C1Q-9ITIkvgRx0egRajtIgSLxM1y7LFaVdpUbfbCr-_YyS5V20NPiZwosedhz2fPA-Adaqw4IYsiIqmblCUWLepQiIgkPtHI0KYN_dl1fXUvPz6ohw04WcfCIGJ2PsNpus1n-aHzy7RVdkrgW-aq1ZtNyYdYrfWJgVS5VCChC1UYUsPxBLMqzen85v0dIUFeTUm4xZCN-WkNykVV6NKRSv01IedV5nIHZqv-Dc4l36bLvp36xz9SN_7vAHZhezQ32dkgHy9gAxd7sLMq5cBGzd6D57_lJdyH8_mPJbLPnQvsPLk-emplZN2yWU7o__URA5sT_qWWHELCPn1JD4pb17GbIYXsS7i__DC_uCrGYguFF7XoC9XGUmqJ3iiad7A2GDhvnEu1qX3b1GXwsXIceeSoI-Fmr4zEFEZbhiCkF69ga9Et8DUwL4JqdRMEmiCDjMajbFyUDXexdS5M4HhFfvt9yKlhMxYpjU28solXduTVBPYTHdcvjiScwEFimCU7ISW79ckryPe20rXSWk_gcMVHO-rkT0sjEcoYrqs3__7mATxLfx7c-w5hqydaH5HJ0bdvs6z9AnYH0ks |
link.rule.ids | 230,315,786,790,802,891,27955,27956,55107 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5V5QAcWmhBLG3BB06IbBM_4vhIC9VCd6sKUqk3K7HHoqq0qSB76a9n7GQXBBw4JbKjyJ6HPWPPfAPwBjUWnDyLLCCpm5Q5Zi1qn4mAJD7BSN_GA_3FRTm7kp-v1fUWvNvkwiBiCj7DaXxNd_m-c6t4VHZMzrdMVasf0D6f6yFba3NnIFUqFkj-hcoMKeJ4h1nk5ri-_PCVfEFeTEm8xYDH_GsXSmVV6NGRUv21JKd95mwXFusRDuElt9NV307d_R_gjf87hSewMxqc7P0gIU9hC5d7sLsu5sBG3d6Dx78hE-7DSf19hWzeNZ6dxOBHR62M7Fu2SJD-N_foWU0eMLWkJBJ2_i12ZF-ajl0OILLP4OrsY306y8ZyC5kTpegz1YZcaonOKFp5sDToOa-aJlandm1V5t6FouHIA0cdyHN2ykiMibS590I68Ry2l90SXwBzwqtWV16g8dLLYBzKqgmy4k1om8ZP4O2a_PZuQNWwyRvJjY28spFXduTVBPYjHTcfjiScwEFkmCVLIcLduhgX5Hpb6FJprSdwuOajHbXyh6WZCGUM18XLf__zNTyc1Yu5nX-6OD-AR3EUQ7DfIWz3RPcjMkD69lWSu5-1WdWf |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=True+Load+Balancing+for+Matricized+Tensor+Times+Khatri-Rao+Product&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Abubaker%2C+Nabil&rft.au=Acer%2C+Seher&rft.au=Aykanat%2C+Cevdet&rft.date=2021-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=32&rft.issue=8&rft.spage=1974&rft_id=info:doi/10.1109%2FTPDS.2021.3053836&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |