Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators

Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible b...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 259 - 271
Main Authors Olyaiy, MohammadHossein, Ng, Christopher, Fedorova, Alexandra Sasha, Lis, Mieszko
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2023
Subjects
Online AccessGet full text
DOI10.1109/ISPASS57527.2023.00033

Cover

Abstract Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible by tuning the dataflow. This optimization is challenging because: (1) the optimization space for modern chip architectures with several levels of memory and multiple levels of spatial processing is vast, and (2) distinct tensor computations follow different memory access and reuse patterns. In this manuscript, we algebraically analyze the possible reuse when executing tensor workloads on an accelerator. Based on our analysis, we develop several principles that significantly reduce the dataflow optimization space even for modern, complex chip architectures. Moreover, these principles are transferable to various tensor workloads with different memory access patterns. Compared to prior work, our techniques can find dataflow for typical tensor workloads up to 800×faster and with up to 1.9×better energy-delay products.
AbstractList Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible by tuning the dataflow. This optimization is challenging because: (1) the optimization space for modern chip architectures with several levels of memory and multiple levels of spatial processing is vast, and (2) distinct tensor computations follow different memory access and reuse patterns. In this manuscript, we algebraically analyze the possible reuse when executing tensor workloads on an accelerator. Based on our analysis, we develop several principles that significantly reduce the dataflow optimization space even for modern, complex chip architectures. Moreover, these principles are transferable to various tensor workloads with different memory access patterns. Compared to prior work, our techniques can find dataflow for typical tensor workloads up to 800×faster and with up to 1.9×better energy-delay products.
Author Olyaiy, MohammadHossein
Lis, Mieszko
Fedorova, Alexandra Sasha
Ng, Christopher
Author_xml – sequence: 1
  givenname: MohammadHossein
  surname: Olyaiy
  fullname: Olyaiy, MohammadHossein
  email: >mohamadol@ece.ubc.ca
  organization: The University of British Columbia
– sequence: 2
  givenname: Christopher
  surname: Ng
  fullname: Ng, Christopher
  email: >chris.ng@ece.ubc.ca
  organization: The University of British Columbia
– sequence: 3
  givenname: Alexandra Sasha
  surname: Fedorova
  fullname: Fedorova, Alexandra Sasha
  email: sasha@ece.ubc.ca
  organization: The University of British Columbia
– sequence: 4
  givenname: Mieszko
  surname: Lis
  fullname: Lis, Mieszko
  email: >mieszko@ece.ubc.ca
  organization: The University of British Columbia
BookMark eNotjMtKxDAYRiPoQsd5A5G8QMekubsrg5eBEYWMbsuf9O9YqGlJOwvf3oKz-jhwzndDLtOQkJB7zjacM_ew8x-V98qo0mxKVooNY0yIC7J2xlmhmHBGOHZNan9K07y0j7SiPkIPoUcKqaFfmCeYu4V8_Mbm1GOm7ZDpG4xjl470gGlasOqPGDLQIVE_Lj70tIoRFxvmIU-35KqFfsL1eVfk8_npsH0t9u8vu221L7qSybngSltrBDROciVjcA65ayxobZVD2UobW1SNddK0wEOw0GqNNjRRKq3QihW5-__tELEec_cD-bfmjCvLpRF_8HFTDw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPASS57527.2023.00033
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350397390
EndPage 271
ExternalDocumentID 10158147
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i204t-1568873ad94154cb99e19d8a66859e4f48cfe5d8947fa1bb8af66e8bdc4565e83
IEDL.DBID RIE
IngestDate Wed Aug 27 02:51:12 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-1568873ad94154cb99e19d8a66859e4f48cfe5d8947fa1bb8af66e8bdc4565e83
PageCount 13
ParticipantIDs ieee_primary_10158147
PublicationCentury 2000
PublicationDate 2023-April
PublicationDateYYYYMMDD 2023-04-01
PublicationDate_xml – month: 04
  year: 2023
  text: 2023-April
PublicationDecade 2020
PublicationTitle 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
PublicationTitleAbbrev ISPASS
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8438079
Snippet Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data...
SourceID ieee
SourceType Publisher
StartPage 259
SubjectTerms accelerator architectures
dataflow computing
neural network hardware
parallel processing
scheduling algorithms
Title Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators
URI https://ieeexplore.ieee.org/document/10158147
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5sT55UrPhmD14Tk2Z3s-stiKUKLYW00FvZV4pYUmmTi7_emaRVEQQvIZvLLjOTndmd-eYj5I5bbiyLi4D5CB88CVSCVYSpNFymTNgU8c6jsRjO2Mucz3dg9QYL471vis98iK9NLt-tbY1XZfCHx1zGLO2QDthZC9baoX7jSN0_55MszyH-6KchsoJjb0IkxP1Bm9J4jcERGe_na4tF3sK6MqH9-NWK8d8LOia9b4AenXy5nhNy4MtTssjrEkM5_0AzmoPsERVFdekoXoqBBmCUg45cvfIbCsEqHWnszrCkUzjLwjBbLTGNTNclRaZisEyaWQt-qUnFb3tkNniaPg6DHX9C8NqPWBXA0Qy2kEQ7BV6aWaOUj5WTWgjJlWcFk7bw3EnF0kLHxkhdCOGlcRbDPC-TM9ItYdXnhDKvnGBW2aSw2N5FqyRySmiriwjMILkgPZTO4r1tkbHYC-byj-9X5BA11JbAXJNutan9DXj3ytw2Wv0ERbOmwA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF20HvSkYsVv9-A1MWl2k11vQZRW21JIC72V_UoRSyo1ufjrnUlaFUHwErK5ZJnZ5M3uzJtHyA03XBsW5h5zAV545MkIqwgToblIWGwS5DsPhnF3wp6mfLomq9dcGOdcXXzmfLytc_l2aSo8KoMvPOQiZMk22QHgZ7yha615v2Egb3vZKM0yiEA6iY-64NidECVxfwin1LjxuE-Gmzc25SKvflVq33z8asb47ykdkPY3RY-OvsDnkGy54ojMsqrAYM7d0ZRmYH3kRVFVWIrHYuADGGXgJVst3IpCuEoHCvszzOkYdrMwTBdzTCTTZUFRqxjWJk2NAWSqk_HvbTJ5fBjfd721goL30glY6cHmDH4ikbIScJoZLaULpRUqjgWXjuVMmNxxKyRLchVqLVQex05oazDQcyI6Jq0CZn1CKHPSxsxIE-UGG7woGQVWxsqoPICFEJ2SNlpn9tY0yZhtDHP2x_NrstsdD_qzfm_4fE720FtNQcwFaZWryl0C1pf6qvbwJy5jqg0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Symposium+on+Performance+Analysis+of+Systems+and+Software+%28ISPASS%29&rft.atitle=Sunstone%3A+A+Scalable+and+Versatile+Scheduler+for+Mapping+Tensor+Algebra+on+Spatial+Accelerators&rft.au=Olyaiy%2C+MohammadHossein&rft.au=Ng%2C+Christopher&rft.au=Fedorova%2C+Alexandra+Sasha&rft.au=Lis%2C+Mieszko&rft.date=2023-04-01&rft.pub=IEEE&rft.spage=259&rft.epage=271&rft_id=info:doi/10.1109%2FISPASS57527.2023.00033&rft.externalDocID=10158147