Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators

Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible b...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 259 - 271
Main Authors	Olyaiy, MohammadHossein, Ng, Christopher, Fedorova, Alexandra Sasha, Lis, Mieszko
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2023
Subjects	accelerator architectures dataflow computing neural network hardware parallel processing scheduling algorithms
Online Access	Get full text
DOI	10.1109/ISPASS57527.2023.00033

Cover

Abstract	Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible by tuning the dataflow. This optimization is challenging because: (1) the optimization space for modern chip architectures with several levels of memory and multiple levels of spatial processing is vast, and (2) distinct tensor computations follow different memory access and reuse patterns. In this manuscript, we algebraically analyze the possible reuse when executing tensor workloads on an accelerator. Based on our analysis, we develop several principles that significantly reduce the dataflow optimization space even for modern, complex chip architectures. Moreover, these principles are transferable to various tensor workloads with different memory access patterns. Compared to prior work, our techniques can find dataflow for typical tensor workloads up to 800×faster and with up to 1.9×better energy-delay products.
AbstractList	Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible by tuning the dataflow. This optimization is challenging because: (1) the optimization space for modern chip architectures with several levels of memory and multiple levels of spatial processing is vast, and (2) distinct tensor computations follow different memory access and reuse patterns. In this manuscript, we algebraically analyze the possible reuse when executing tensor workloads on an accelerator. Based on our analysis, we develop several principles that significantly reduce the dataflow optimization space even for modern, complex chip architectures. Moreover, these principles are transferable to various tensor workloads with different memory access patterns. Compared to prior work, our techniques can find dataflow for typical tensor workloads up to 800×faster and with up to 1.9×better energy-delay products.
Author	Olyaiy, MohammadHossein Lis, Mieszko Fedorova, Alexandra Sasha Ng, Christopher
Author_xml	– sequence: 1 givenname: MohammadHossein surname: Olyaiy fullname: Olyaiy, MohammadHossein email: >mohamadol@ece.ubc.ca organization: The University of British Columbia – sequence: 2 givenname: Christopher surname: Ng fullname: Ng, Christopher email: >chris.ng@ece.ubc.ca organization: The University of British Columbia – sequence: 3 givenname: Alexandra Sasha surname: Fedorova fullname: Fedorova, Alexandra Sasha email: sasha@ece.ubc.ca organization: The University of British Columbia – sequence: 4 givenname: Mieszko surname: Lis fullname: Lis, Mieszko email: >mieszko@ece.ubc.ca organization: The University of British Columbia
BookMark	eNotjMtKxDAYRiPoQsd5A5G8QMekubsrg5eBEYWMbsuf9O9YqGlJOwvf3oKz-jhwzndDLtOQkJB7zjacM_ew8x-V98qo0mxKVooNY0yIC7J2xlmhmHBGOHZNan9K07y0j7SiPkIPoUcKqaFfmCeYu4V8_Mbm1GOm7ZDpG4xjl470gGlasOqPGDLQIVE_Lj70tIoRFxvmIU-35KqFfsL1eVfk8_npsH0t9u8vu221L7qSybngSltrBDROciVjcA65ayxobZVD2UobW1SNddK0wEOw0GqNNjRRKq3QihW5-__tELEec_cD-bfmjCvLpRF_8HFTDw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ISPASS57527.2023.00033
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350397390
EndPage	271
ExternalDocumentID	10158147
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i204t-1568873ad94154cb99e19d8a66859e4f48cfe5d8947fa1bb8af66e8bdc4565e83
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:51:12 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i204t-1568873ad94154cb99e19d8a66859e4f48cfe5d8947fa1bb8af66e8bdc4565e83
PageCount	13
ParticipantIDs	ieee_primary_10158147
PublicationCentury	2000
PublicationDate	2023-April
PublicationDateYYYYMMDD	2023-04-01
PublicationDate_xml	– month: 04 year: 2023 text: 2023-April
PublicationDecade	2020
PublicationTitle	2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
PublicationTitleAbbrev	ISPASS
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8438079
Snippet	Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data...
SourceID	ieee
SourceType	Publisher
StartPage	259
SubjectTerms	accelerator architectures dataflow computing neural network hardware parallel processing scheduling algorithms
Title	Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators
URI	https://ieeexplore.ieee.org/document/10158147
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5sT55UrPhmD14Tk2Z3s-stiKUKLYW00FvZV4pYUmmTi7_emaRVEQQvIZvLLjOTndmd-eYj5I5bbiyLi4D5CB88CVSCVYSpNFymTNgU8c6jsRjO2Mucz3dg9QYL471vis98iK9NLt-tbY1XZfCHx1zGLO2QDthZC9baoX7jSN0_55MszyH-6KchsoJjb0IkxP1Bm9J4jcERGe_na4tF3sK6MqH9-NWK8d8LOia9b4AenXy5nhNy4MtTssjrEkM5_0AzmoPsERVFdekoXoqBBmCUg45cvfIbCsEqHWnszrCkUzjLwjBbLTGNTNclRaZisEyaWQt-qUnFb3tkNniaPg6DHX9C8NqPWBXA0Qy2kEQ7BV6aWaOUj5WTWgjJlWcFk7bw3EnF0kLHxkhdCOGlcRbDPC-TM9ItYdXnhDKvnGBW2aSw2N5FqyRySmiriwjMILkgPZTO4r1tkbHYC-byj-9X5BA11JbAXJNutan9DXj3ytw2Wv0ERbOmwA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF20HvSkYsVv9-A1MWl2k11vQZRW21JIC72V_UoRSyo1ufjrnUlaFUHwErK5ZJnZ5M3uzJtHyA03XBsW5h5zAV545MkIqwgToblIWGwS5DsPhnF3wp6mfLomq9dcGOdcXXzmfLytc_l2aSo8KoMvPOQiZMk22QHgZ7yha615v2Egb3vZKM0yiEA6iY-64NidECVxfwin1LjxuE-Gmzc25SKvflVq33z8asb47ykdkPY3RY-OvsDnkGy54ojMsqrAYM7d0ZRmYH3kRVFVWIrHYuADGGXgJVst3IpCuEoHCvszzOkYdrMwTBdzTCTTZUFRqxjWJk2NAWSqk_HvbTJ5fBjfd721goL30glY6cHmDH4ikbIScJoZLaULpRUqjgWXjuVMmNxxKyRLchVqLVQex05oazDQcyI6Jq0CZn1CKHPSxsxIE-UGG7woGQVWxsqoPICFEJ2SNlpn9tY0yZhtDHP2x_NrstsdD_qzfm_4fE720FtNQcwFaZWryl0C1pf6qvbwJy5jqg0
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Symposium+on+Performance+Analysis+of+Systems+and+Software+%28ISPASS%29&rft.atitle=Sunstone%3A+A+Scalable+and+Versatile+Scheduler+for+Mapping+Tensor+Algebra+on+Spatial+Accelerators&rft.au=Olyaiy%2C+MohammadHossein&rft.au=Ng%2C+Christopher&rft.au=Fedorova%2C+Alexandra+Sasha&rft.au=Lis%2C+Mieszko&rft.date=2023-04-01&rft.pub=IEEE&rft.spage=259&rft.epage=271&rft_id=info:doi/10.1109%2FISPASS57527.2023.00033&rft.externalDocID=10158147