True Load Balancing for Matricized Tensor Times Khatri-Rao Product

MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory archite...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 32; no. 8; pp. 1974 - 1986
Main Authors	Abubaker, Nabil, Acer, Seher, Aykanat, Cevdet
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Computational efficiency Computational modeling Computer architecture Computing costs CP Decomposition Distributed memory Fine-grain hypergraph partitioning Fragmentation Load Load balancing Load modeling Mathematical analysis MATHEMATICS AND COMPUTING Microprocessors MTTKRP Partitioning Partitioning algorithms Processors Program processors Sparse matrices Sparse tensors Tensors
Online Access	Get full text

Cover

Loading…

Abstract	MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.
AbstractList	MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors’ computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes. MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. In conclusion, parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.
Author	Aykanat, Cevdet Acer, Seher Abubaker, Nabil
Author_xml	– sequence: 1 givenname: Nabil orcidid: 0000-0002-5060-3059 surname: Abubaker fullname: Abubaker, Nabil email: nabil.abubaker@bilkent.edu.tr organization: Department of Computer Engineering, Bilkent University, Ankara, Turkey – sequence: 2 givenname: Seher orcidid: 0000-0003-3951-3930 surname: Acer fullname: Acer, Seher email: sacer@sandia.gov organization: Sandia National Labs, Albuquerque, NM, USA – sequence: 3 givenname: Cevdet orcidid: 0000-0002-4559-1321 surname: Aykanat fullname: Aykanat, Cevdet email: aykanat@cs.bilkent.edu.tr organization: Department of Computer Engineering, Bilkent University, Ankara, Turkey
BackLink	https://www.osti.gov/servlets/purl/1765777$$D View this record in Osti.gov
BookMark	eNo9kMlOwzAQQC1UJNrCByAuEZxTvMb2kZZVFFFBOFuuM6Gp2rjYyQG-nkRBnGY082bRm6BR7WtA6JzgGSFYX-er2_cZxZTMGBZMsewIjYkQKqVEsVGXYy5STYk-QZMYtxgTLjAfo3keWkiW3hbJ3O5s7ar6Myl9SF5sEypX_UCR5FDHrpJXe4jJ86ZvpG_WJ6vgi9Y1p-i4tLsIZ39xij7u7_LFY7p8fXha3CxTxzLWpGJdYi45OC2UkpBpKChV1lKuhVurDBeuJJYCLSnIUinlhOYglca4KBh3bIouh70-NpWJrmrAbZyva3CNITITUsoOuhqgQ_BfLcTGbH0b6u4v0x1iQmsqSUeRgXLBxxigNIdQ7W34NgSb3qfpfZrep_nz2c1cDDMVAPzzmjHOCWe_FRtxKA
CODEN	ITDSEO
CitedBy_id	crossref_primary_10_1109_TPDS_2021_3128827 crossref_primary_10_1109_TPDS_2023_3288520
Cites_doi	10.1145/2833179.2833183 10.1109/ASONAM.2011.80 10.1007/s10462-020-09916-4 10.1039/c3ay41160e 10.1137/18M1210691 10.1007/s42514-019-00012-w 10.1145/2736277.2741077 10.1109/IPDPS.2015.27 10.1109/IPDPS.2001.925093 10.1109/TSP.2017.2690524 10.1109/TPDS.2020.3012624 10.1137/16M1102744 10.1109/71.780863 10.1109/SC.2018.00022 10.1145/2807591.2807624 10.1093/bioinformatics/btm210 10.1145/2339530.2339583 10.1109/92.748202 10.1109/IPDPS.2016.113 10.1137/060676489 10.1109/TKDE.2008.112 10.1137/S0036144502409019 10.1109/IPDPS.2014.62 10.1137/07070111X 10.1145/1921632.1921636 10.1145/2487575.2487619 10.1145/2507157.2507163 10.1137/080737770 10.1109/TPDS.2018.2841843
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
CorporateAuthor	Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
CorporateAuthor_xml	– name: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D OIOZB OTOTI
DOI	10.1109/TPDS.2021.3053836
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional OSTI.GOV - Hybrid OSTI.GOV
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1558-2183
EndPage	1986
ExternalDocumentID	1765777 10_1109_TPDS_2021_3053836 9334414
Genre	orig-research
GrantInformation_xml	– fundername: Türkiye Bilimsel ve Teknolojik Araştirma Kurumu; Scientific and Technological Research Council of Turkey grantid: EEEAG-116E043 funderid: 10.13039/501100004410
GroupedDBID	--Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACIWK AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIC RIE RIG RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D ABPTK OIOZB OTOTI PQEST
ID	FETCH-LOGICAL-c363t-5bf0474ec95887e69ed228aa2495cb860dcf1a2e2f2e7f888c594e78900dd34c3
IEDL.DBID	RIE
ISSN	1045-9219
IngestDate	Fri May 19 00:37:17 EDT 2023 Thu Oct 10 18:07:10 EDT 2024 Fri Aug 23 00:58:48 EDT 2024 Wed Jun 26 19:26:29 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	8
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c363t-5bf0474ec95887e69ed228aa2495cb860dcf1a2e2f2e7f888c594e78900dd34c3
Notes	AC04-94AL85000 USDOE National Nuclear Security Administration (NNSA) SAND-2021-0767J
ORCID	0000-0002-5060-3059 0000-0003-3951-3930 0000-0002-4559-1321
OpenAccessLink	https://www.osti.gov/servlets/purl/1765777
PQID	2493599271
PQPubID	85437
PageCount	13
ParticipantIDs	proquest_journals_2493599271 osti_scitechconnect_1765777 ieee_primary_9334414 crossref_primary_10_1109_TPDS_2021_3053836
PublicationCentury	2000
PublicationDate	2021-08-01
PublicationDateYYYYMMDD	2021-08-01
PublicationDate_xml	– month: 08 year: 2021 text: 2021-08-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: United States
PublicationTitle	IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev	TPDS
PublicationYear	2021
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref37 ref15 ref14 ref31 ref33 ref11 ref10 ref2 ref1 ref17 ref16 çatalyürek (ref35) 2001; 1 ref19 ref18 catalyurek (ref34) 2001 shetty (ref22) 2004 choi (ref30) 2014 ref23 li (ref32) 2016 ref26 ref25 ref20 ref21 ref28 ref29 ref8 ref7 ref9 ref4 ref3 görlitz (ref24) 2008 ref6 ref5 carlson (ref27) 2010; 5 uçar (ref36) 2003
References_xml	– ident: ref14 doi: 10.1145/2833179.2833183 – ident: ref6 doi: 10.1109/ASONAM.2011.80 – ident: ref23 doi: 10.1007/s10462-020-09916-4 – year: 2008 ident: ref24 article-title: PINTS: Peer-to-peer infrastructure for tagging systems publication-title: Proc 7th Int Conf Peer-to-Peer Syst contributor: fullname: görlitz – start-page: 1296 year: 2014 ident: ref30 article-title: DFacTo: Distributed factorization of tensors publication-title: Proc 27th Int Conf Neural Inf Process Syst contributor: fullname: choi – ident: ref5 doi: 10.1039/c3ay41160e – ident: ref31 doi: 10.1137/18M1210691 – ident: ref19 doi: 10.1007/s42514-019-00012-w – ident: ref25 doi: 10.1145/2736277.2741077 – ident: ref13 doi: 10.1109/IPDPS.2015.27 – volume: 5 year: 2010 ident: ref27 article-title: Toward an architecture for never-ending language learning publication-title: Proc 24th AAAI Conf Artif Intell contributor: fullname: carlson – year: 2016 ident: ref32 article-title: ParTI!: A parallel tensor infrastructure for data analysis contributor: fullname: li – start-page: 28 year: 2001 ident: ref34 article-title: A hypergraph-partitioning approach for coarse-grain decomposition publication-title: Proc ACM/IEEE Conf Supercomputing contributor: fullname: catalyurek – volume: 1 year: 2001 ident: ref35 article-title: A fine-grain hypergraph model for 2D decomposition of sparse matrices publication-title: Proc 15th Intl Parallel and Distrib Process Symp doi: 10.1109/IPDPS.2001.925093 contributor: fullname: çatalyürek – ident: ref7 doi: 10.1109/TSP.2017.2690524 – ident: ref12 doi: 10.1109/TPDS.2020.3012624 – ident: ref10 doi: 10.1137/16M1102744 – ident: ref18 doi: 10.1109/71.780863 – ident: ref4 doi: 10.1109/TSP.2017.2690524 – ident: ref15 doi: 10.1109/SC.2018.00022 – ident: ref9 doi: 10.1145/2807591.2807624 – ident: ref1 doi: 10.1093/bioinformatics/btm210 – ident: ref28 doi: 10.1145/2339530.2339583 – ident: ref20 doi: 10.1109/92.748202 – ident: ref17 doi: 10.1109/IPDPS.2016.113 – ident: ref29 doi: 10.1137/060676489 – ident: ref3 doi: 10.1109/TKDE.2008.112 – ident: ref21 doi: 10.1137/S0036144502409019 – start-page: 926 year: 2003 ident: ref36 article-title: Minimizing communication cost in fine-grain partitioning of sparse matrices publication-title: Proc Int Symp Comput Inf Sci contributor: fullname: uçar – ident: ref37 doi: 10.1109/IPDPS.2014.62 – year: 2004 ident: ref22 article-title: The enron Email dataset database schema and brief statistical report contributor: fullname: shetty – ident: ref16 doi: 10.1137/07070111X – ident: ref8 doi: 10.1145/1921632.1921636 – ident: ref2 doi: 10.1145/2487575.2487619 – ident: ref26 doi: 10.1145/2507157.2507163 – ident: ref33 doi: 10.1137/080737770 – ident: ref11 doi: 10.1109/TPDS.2018.2841843
SSID	ssj0014504
Score	2.394259
Snippet	MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF)...
SourceID	osti proquest crossref ieee
SourceType	Open Access Repository Aggregation Database Publisher
StartPage	1974
SubjectTerms	Algorithms Computational efficiency Computational modeling Computer architecture Computing costs CP Decomposition Distributed memory Fine-grain hypergraph partitioning Fragmentation Load Load balancing Load modeling Mathematical analysis MATHEMATICS AND COMPUTING Microprocessors MTTKRP Partitioning Partitioning algorithms Processors Program processors Sparse matrices Sparse tensors Tensors
Title	True Load Balancing for Matricized Tensor Times Khatri-Rao Product
URI	https://ieeexplore.ieee.org/document/9334414 https://www.proquest.com/docview/2493599271 https://www.osti.gov/servlets/purl/1765777
Volume	32
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4Bp_YAFFp1C1Q-9ITIkvgRx0egRajtIgSLxM1y7LFaVdpUbfbCr-_YyS5V20NPiZwosedhz2fPA-Adaqw4IYsiIqmblCUWLepQiIgkPtHI0KYN_dl1fXUvPz6ohw04WcfCIGJ2PsNpus1n-aHzy7RVdkrgW-aq1ZtNyYdYrfWJgVS5VCChC1UYUsPxBLMqzen85v0dIUFeTUm4xZCN-WkNykVV6NKRSv01IedV5nIHZqv-Dc4l36bLvp36xz9SN_7vAHZhezQ32dkgHy9gAxd7sLMq5cBGzd6D57_lJdyH8_mPJbLPnQvsPLk-emplZN2yWU7o__URA5sT_qWWHELCPn1JD4pb17GbIYXsS7i__DC_uCrGYguFF7XoC9XGUmqJ3iiad7A2GDhvnEu1qX3b1GXwsXIceeSoI-Fmr4zEFEZbhiCkF69ga9Et8DUwL4JqdRMEmiCDjMajbFyUDXexdS5M4HhFfvt9yKlhMxYpjU28solXduTVBPYTHdcvjiScwEFimCU7ISW79ckryPe20rXSWk_gcMVHO-rkT0sjEcoYrqs3__7mATxLfx7c-w5hqydaH5HJ0bdvs6z9AnYH0ks
link.rule.ids	230,315,786,790,802,891,27955,27956,55107
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5V5QAcWmhBLG3BB06IbBM_4vhIC9VCd6sKUqk3K7HHoqq0qSB76a9n7GQXBBw4JbKjyJ6HPWPPfAPwBjUWnDyLLCCpm5Q5Zi1qn4mAJD7BSN_GA_3FRTm7kp-v1fUWvNvkwiBiCj7DaXxNd_m-c6t4VHZMzrdMVasf0D6f6yFba3NnIFUqFkj-hcoMKeJ4h1nk5ri-_PCVfEFeTEm8xYDH_GsXSmVV6NGRUv21JKd95mwXFusRDuElt9NV307d_R_gjf87hSewMxqc7P0gIU9hC5d7sLsu5sBG3d6Dx78hE-7DSf19hWzeNZ6dxOBHR62M7Fu2SJD-N_foWU0eMLWkJBJ2_i12ZF-ajl0OILLP4OrsY306y8ZyC5kTpegz1YZcaonOKFp5sDToOa-aJlandm1V5t6FouHIA0cdyHN2ykiMibS590I68Ry2l90SXwBzwqtWV16g8dLLYBzKqgmy4k1om8ZP4O2a_PZuQNWwyRvJjY28spFXduTVBPYjHTcfjiScwEFkmCVLIcLduhgX5Hpb6FJprSdwuOajHbXyh6WZCGUM18XLf__zNTyc1Yu5nX-6OD-AR3EUQ7DfIWz3RPcjMkD69lWSu5-1WdWf
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=True+Load+Balancing+for+Matricized+Tensor+Times+Khatri-Rao+Product&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Abubaker%2C+Nabil&rft.au=Acer%2C+Seher&rft.au=Aykanat%2C+Cevdet&rft.date=2021-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=32&rft.issue=8&rft.spage=1974&rft_id=info:doi/10.1109%2FTPDS.2021.3053836&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon