True Load Balancing for Matricized Tensor Times Khatri-Rao Product

MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory archite...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 32; no. 8; pp. 1974 - 1986
Main Authors Abubaker, Nabil, Acer, Seher, Aykanat, Cevdet
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.
AbstractList MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors’ computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.
MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors' computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. In conclusion, parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.
Author Aykanat, Cevdet
Acer, Seher
Abubaker, Nabil
Author_xml – sequence: 1
  givenname: Nabil
  orcidid: 0000-0002-5060-3059
  surname: Abubaker
  fullname: Abubaker, Nabil
  email: nabil.abubaker@bilkent.edu.tr
  organization: Department of Computer Engineering, Bilkent University, Ankara, Turkey
– sequence: 2
  givenname: Seher
  orcidid: 0000-0003-3951-3930
  surname: Acer
  fullname: Acer, Seher
  email: sacer@sandia.gov
  organization: Sandia National Labs, Albuquerque, NM, USA
– sequence: 3
  givenname: Cevdet
  orcidid: 0000-0002-4559-1321
  surname: Aykanat
  fullname: Aykanat, Cevdet
  email: aykanat@cs.bilkent.edu.tr
  organization: Department of Computer Engineering, Bilkent University, Ankara, Turkey
BackLink https://www.osti.gov/servlets/purl/1765777$$D View this record in Osti.gov
BookMark eNo9kMlOwzAQQC1UJNrCByAuEZxTvMb2kZZVFFFBOFuuM6Gp2rjYyQG-nkRBnGY082bRm6BR7WtA6JzgGSFYX-er2_cZxZTMGBZMsewIjYkQKqVEsVGXYy5STYk-QZMYtxgTLjAfo3keWkiW3hbJ3O5s7ar6Myl9SF5sEypX_UCR5FDHrpJXe4jJ86ZvpG_WJ6vgi9Y1p-i4tLsIZ39xij7u7_LFY7p8fXha3CxTxzLWpGJdYi45OC2UkpBpKChV1lKuhVurDBeuJJYCLSnIUinlhOYglca4KBh3bIouh70-NpWJrmrAbZyva3CNITITUsoOuhqgQ_BfLcTGbH0b6u4v0x1iQmsqSUeRgXLBxxigNIdQ7W34NgSb3qfpfZrep_nz2c1cDDMVAPzzmjHOCWe_FRtxKA
CODEN ITDSEO
CitedBy_id crossref_primary_10_1109_TPDS_2021_3128827
crossref_primary_10_1109_TPDS_2023_3288520
Cites_doi 10.1145/2833179.2833183
10.1109/ASONAM.2011.80
10.1007/s10462-020-09916-4
10.1039/c3ay41160e
10.1137/18M1210691
10.1007/s42514-019-00012-w
10.1145/2736277.2741077
10.1109/IPDPS.2015.27
10.1109/IPDPS.2001.925093
10.1109/TSP.2017.2690524
10.1109/TPDS.2020.3012624
10.1137/16M1102744
10.1109/71.780863
10.1109/SC.2018.00022
10.1145/2807591.2807624
10.1093/bioinformatics/btm210
10.1145/2339530.2339583
10.1109/92.748202
10.1109/IPDPS.2016.113
10.1137/060676489
10.1109/TKDE.2008.112
10.1137/S0036144502409019
10.1109/IPDPS.2014.62
10.1137/07070111X
10.1145/1921632.1921636
10.1145/2487575.2487619
10.1145/2507157.2507163
10.1137/080737770
10.1109/TPDS.2018.2841843
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
CorporateAuthor Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
CorporateAuthor_xml – name: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
OIOZB
OTOTI
DOI 10.1109/TPDS.2021.3053836
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
OSTI.GOV - Hybrid
OSTI.GOV
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database


Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 1986
ExternalDocumentID 1765777
10_1109_TPDS_2021_3053836
9334414
Genre orig-research
GrantInformation_xml – fundername: Türkiye Bilimsel ve Teknolojik Araştirma Kurumu; Scientific and Technological Research Council of Turkey
  grantid: EEEAG-116E043
  funderid: 10.13039/501100004410
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIC
RIE
RIG
RNS
TN5
TWZ
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ABPTK
OIOZB
OTOTI
PQEST
ID FETCH-LOGICAL-c363t-5bf0474ec95887e69ed228aa2495cb860dcf1a2e2f2e7f888c594e78900dd34c3
IEDL.DBID RIE
ISSN 1045-9219
IngestDate Fri May 19 00:37:17 EDT 2023
Thu Oct 10 18:07:10 EDT 2024
Fri Aug 23 00:58:48 EDT 2024
Wed Jun 26 19:26:29 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c363t-5bf0474ec95887e69ed228aa2495cb860dcf1a2e2f2e7f888c594e78900dd34c3
Notes AC04-94AL85000
USDOE National Nuclear Security Administration (NNSA)
SAND-2021-0767J
ORCID 0000-0002-5060-3059
0000-0003-3951-3930
0000-0002-4559-1321
OpenAccessLink https://www.osti.gov/servlets/purl/1765777
PQID 2493599271
PQPubID 85437
PageCount 13
ParticipantIDs proquest_journals_2493599271
osti_scitechconnect_1765777
ieee_primary_9334414
crossref_primary_10_1109_TPDS_2021_3053836
PublicationCentury 2000
PublicationDate 2021-08-01
PublicationDateYYYYMMDD 2021-08-01
PublicationDate_xml – month: 08
  year: 2021
  text: 2021-08-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: United States
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2021
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref37
ref15
ref14
ref31
ref33
ref11
ref10
ref2
ref1
ref17
ref16
çatalyürek (ref35) 2001; 1
ref19
ref18
catalyurek (ref34) 2001
shetty (ref22) 2004
choi (ref30) 2014
ref23
li (ref32) 2016
ref26
ref25
ref20
ref21
ref28
ref29
ref8
ref7
ref9
ref4
ref3
görlitz (ref24) 2008
ref6
ref5
carlson (ref27) 2010; 5
uçar (ref36) 2003
References_xml – ident: ref14
  doi: 10.1145/2833179.2833183
– ident: ref6
  doi: 10.1109/ASONAM.2011.80
– ident: ref23
  doi: 10.1007/s10462-020-09916-4
– year: 2008
  ident: ref24
  article-title: PINTS: Peer-to-peer infrastructure for tagging systems
  publication-title: Proc 7th Int Conf Peer-to-Peer Syst
  contributor:
    fullname: görlitz
– start-page: 1296
  year: 2014
  ident: ref30
  article-title: DFacTo: Distributed factorization of tensors
  publication-title: Proc 27th Int Conf Neural Inf Process Syst
  contributor:
    fullname: choi
– ident: ref5
  doi: 10.1039/c3ay41160e
– ident: ref31
  doi: 10.1137/18M1210691
– ident: ref19
  doi: 10.1007/s42514-019-00012-w
– ident: ref25
  doi: 10.1145/2736277.2741077
– ident: ref13
  doi: 10.1109/IPDPS.2015.27
– volume: 5
  year: 2010
  ident: ref27
  article-title: Toward an architecture for never-ending language learning
  publication-title: Proc 24th AAAI Conf Artif Intell
  contributor:
    fullname: carlson
– year: 2016
  ident: ref32
  article-title: ParTI!: A parallel tensor infrastructure for data analysis
  contributor:
    fullname: li
– start-page: 28
  year: 2001
  ident: ref34
  article-title: A hypergraph-partitioning approach for coarse-grain decomposition
  publication-title: Proc ACM/IEEE Conf Supercomputing
  contributor:
    fullname: catalyurek
– volume: 1
  year: 2001
  ident: ref35
  article-title: A fine-grain hypergraph model for 2D decomposition of sparse matrices
  publication-title: Proc 15th Intl Parallel and Distrib Process Symp
  doi: 10.1109/IPDPS.2001.925093
  contributor:
    fullname: çatalyürek
– ident: ref7
  doi: 10.1109/TSP.2017.2690524
– ident: ref12
  doi: 10.1109/TPDS.2020.3012624
– ident: ref10
  doi: 10.1137/16M1102744
– ident: ref18
  doi: 10.1109/71.780863
– ident: ref4
  doi: 10.1109/TSP.2017.2690524
– ident: ref15
  doi: 10.1109/SC.2018.00022
– ident: ref9
  doi: 10.1145/2807591.2807624
– ident: ref1
  doi: 10.1093/bioinformatics/btm210
– ident: ref28
  doi: 10.1145/2339530.2339583
– ident: ref20
  doi: 10.1109/92.748202
– ident: ref17
  doi: 10.1109/IPDPS.2016.113
– ident: ref29
  doi: 10.1137/060676489
– ident: ref3
  doi: 10.1109/TKDE.2008.112
– ident: ref21
  doi: 10.1137/S0036144502409019
– start-page: 926
  year: 2003
  ident: ref36
  article-title: Minimizing communication cost in fine-grain partitioning of sparse matrices
  publication-title: Proc Int Symp Comput Inf Sci
  contributor:
    fullname: uçar
– ident: ref37
  doi: 10.1109/IPDPS.2014.62
– year: 2004
  ident: ref22
  article-title: The enron Email dataset database schema and brief statistical report
  contributor:
    fullname: shetty
– ident: ref16
  doi: 10.1137/07070111X
– ident: ref8
  doi: 10.1145/1921632.1921636
– ident: ref2
  doi: 10.1145/2487575.2487619
– ident: ref26
  doi: 10.1145/2507157.2507163
– ident: ref33
  doi: 10.1137/080737770
– ident: ref11
  doi: 10.1109/TPDS.2018.2841843
SSID ssj0014504
Score 2.394259
Snippet MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF)...
SourceID osti
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Publisher
StartPage 1974
SubjectTerms Algorithms
Computational efficiency
Computational modeling
Computer architecture
Computing costs
CP Decomposition
Distributed memory
Fine-grain hypergraph partitioning
Fragmentation
Load
Load balancing
Load modeling
Mathematical analysis
MATHEMATICS AND COMPUTING
Microprocessors
MTTKRP
Partitioning
Partitioning algorithms
Processors
Program processors
Sparse matrices
Sparse tensors
Tensors
Title True Load Balancing for Matricized Tensor Times Khatri-Rao Product
URI https://ieeexplore.ieee.org/document/9334414
https://www.proquest.com/docview/2493599271
https://www.osti.gov/servlets/purl/1765777
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4Bp_YAFFp1C1Q-9ITIkvgRx0egRajtIgSLxM1y7LFaVdpUbfbCr-_YyS5V20NPiZwosedhz2fPA-Adaqw4IYsiIqmblCUWLepQiIgkPtHI0KYN_dl1fXUvPz6ohw04WcfCIGJ2PsNpus1n-aHzy7RVdkrgW-aq1ZtNyYdYrfWJgVS5VCChC1UYUsPxBLMqzen85v0dIUFeTUm4xZCN-WkNykVV6NKRSv01IedV5nIHZqv-Dc4l36bLvp36xz9SN_7vAHZhezQ32dkgHy9gAxd7sLMq5cBGzd6D57_lJdyH8_mPJbLPnQvsPLk-emplZN2yWU7o__URA5sT_qWWHELCPn1JD4pb17GbIYXsS7i__DC_uCrGYguFF7XoC9XGUmqJ3iiad7A2GDhvnEu1qX3b1GXwsXIceeSoI-Fmr4zEFEZbhiCkF69ga9Et8DUwL4JqdRMEmiCDjMajbFyUDXexdS5M4HhFfvt9yKlhMxYpjU28solXduTVBPYTHdcvjiScwEFimCU7ISW79ckryPe20rXSWk_gcMVHO-rkT0sjEcoYrqs3__7mATxLfx7c-w5hqydaH5HJ0bdvs6z9AnYH0ks
link.rule.ids 230,315,786,790,802,891,27955,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5V5QAcWmhBLG3BB06IbBM_4vhIC9VCd6sKUqk3K7HHoqq0qSB76a9n7GQXBBw4JbKjyJ6HPWPPfAPwBjUWnDyLLCCpm5Q5Zi1qn4mAJD7BSN_GA_3FRTm7kp-v1fUWvNvkwiBiCj7DaXxNd_m-c6t4VHZMzrdMVasf0D6f6yFba3NnIFUqFkj-hcoMKeJ4h1nk5ri-_PCVfEFeTEm8xYDH_GsXSmVV6NGRUv21JKd95mwXFusRDuElt9NV307d_R_gjf87hSewMxqc7P0gIU9hC5d7sLsu5sBG3d6Dx78hE-7DSf19hWzeNZ6dxOBHR62M7Fu2SJD-N_foWU0eMLWkJBJ2_i12ZF-ajl0OILLP4OrsY306y8ZyC5kTpegz1YZcaonOKFp5sDToOa-aJlandm1V5t6FouHIA0cdyHN2ykiMibS590I68Ry2l90SXwBzwqtWV16g8dLLYBzKqgmy4k1om8ZP4O2a_PZuQNWwyRvJjY28spFXduTVBPYjHTcfjiScwEFkmCVLIcLduhgX5Hpb6FJprSdwuOajHbXyh6WZCGUM18XLf__zNTyc1Yu5nX-6OD-AR3EUQ7DfIWz3RPcjMkD69lWSu5-1WdWf
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=True+Load+Balancing+for+Matricized+Tensor+Times+Khatri-Rao+Product&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Abubaker%2C+Nabil&rft.au=Acer%2C+Seher&rft.au=Aykanat%2C+Cevdet&rft.date=2021-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=32&rft.issue=8&rft.spage=1974&rft_id=info:doi/10.1109%2FTPDS.2021.3053836&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon