Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators
Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible b...
Saved in:
Published in | 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) pp. 259 - 271 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.04.2023
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ISPASS57527.2023.00033 |
Cover
Abstract | Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible by tuning the dataflow. This optimization is challenging because: (1) the optimization space for modern chip architectures with several levels of memory and multiple levels of spatial processing is vast, and (2) distinct tensor computations follow different memory access and reuse patterns. In this manuscript, we algebraically analyze the possible reuse when executing tensor workloads on an accelerator. Based on our analysis, we develop several principles that significantly reduce the dataflow optimization space even for modern, complex chip architectures. Moreover, these principles are transferable to various tensor workloads with different memory access patterns. Compared to prior work, our techniques can find dataflow for typical tensor workloads up to 800×faster and with up to 1.9×better energy-delay products. |
---|---|
AbstractList | Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data reuse available. To achieve the benefits, however, optimizing the dataflow is crucial: prior works showed that 19×energy savings are possible by tuning the dataflow. This optimization is challenging because: (1) the optimization space for modern chip architectures with several levels of memory and multiple levels of spatial processing is vast, and (2) distinct tensor computations follow different memory access and reuse patterns. In this manuscript, we algebraically analyze the possible reuse when executing tensor workloads on an accelerator. Based on our analysis, we develop several principles that significantly reduce the dataflow optimization space even for modern, complex chip architectures. Moreover, these principles are transferable to various tensor workloads with different memory access patterns. Compared to prior work, our techniques can find dataflow for typical tensor workloads up to 800×faster and with up to 1.9×better energy-delay products. |
Author | Olyaiy, MohammadHossein Lis, Mieszko Fedorova, Alexandra Sasha Ng, Christopher |
Author_xml | – sequence: 1 givenname: MohammadHossein surname: Olyaiy fullname: Olyaiy, MohammadHossein email: >mohamadol@ece.ubc.ca organization: The University of British Columbia – sequence: 2 givenname: Christopher surname: Ng fullname: Ng, Christopher email: >chris.ng@ece.ubc.ca organization: The University of British Columbia – sequence: 3 givenname: Alexandra Sasha surname: Fedorova fullname: Fedorova, Alexandra Sasha email: sasha@ece.ubc.ca organization: The University of British Columbia – sequence: 4 givenname: Mieszko surname: Lis fullname: Lis, Mieszko email: >mieszko@ece.ubc.ca organization: The University of British Columbia |
BookMark | eNotjMtKxDAYRiPoQsd5A5G8QMekubsrg5eBEYWMbsuf9O9YqGlJOwvf3oKz-jhwzndDLtOQkJB7zjacM_ew8x-V98qo0mxKVooNY0yIC7J2xlmhmHBGOHZNan9K07y0j7SiPkIPoUcKqaFfmCeYu4V8_Mbm1GOm7ZDpG4xjl470gGlasOqPGDLQIVE_Lj70tIoRFxvmIU-35KqFfsL1eVfk8_npsH0t9u8vu221L7qSybngSltrBDROciVjcA65ayxobZVD2UobW1SNddK0wEOw0GqNNjRRKq3QihW5-__tELEec_cD-bfmjCvLpRF_8HFTDw |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ISPASS57527.2023.00033 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350397390 |
EndPage | 271 |
ExternalDocumentID | 10158147 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i204t-1568873ad94154cb99e19d8a66859e4f48cfe5d8947fa1bb8af66e8bdc4565e83 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:51:12 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i204t-1568873ad94154cb99e19d8a66859e4f48cfe5d8947fa1bb8af66e8bdc4565e83 |
PageCount | 13 |
ParticipantIDs | ieee_primary_10158147 |
PublicationCentury | 2000 |
PublicationDate | 2023-April |
PublicationDateYYYYMMDD | 2023-04-01 |
PublicationDate_xml | – month: 04 year: 2023 text: 2023-April |
PublicationDecade | 2020 |
PublicationTitle | 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) |
PublicationTitleAbbrev | ISPASS |
PublicationYear | 2023 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8438079 |
Snippet | Tensor algebra, the main component of several popular machine learning techniques, benefits from modern accelerators due to the massive parallelism and data... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 259 |
SubjectTerms | accelerator architectures dataflow computing neural network hardware parallel processing scheduling algorithms |
Title | Sunstone: A Scalable and Versatile Scheduler for Mapping Tensor Algebra on Spatial Accelerators |
URI | https://ieeexplore.ieee.org/document/10158147 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5sT55UrPhmD14Tk2Z3s-stiKUKLYW00FvZV4pYUmmTi7_emaRVEQQvIZvLLjOTndmd-eYj5I5bbiyLi4D5CB88CVSCVYSpNFymTNgU8c6jsRjO2Mucz3dg9QYL471vis98iK9NLt-tbY1XZfCHx1zGLO2QDthZC9baoX7jSN0_55MszyH-6KchsoJjb0IkxP1Bm9J4jcERGe_na4tF3sK6MqH9-NWK8d8LOia9b4AenXy5nhNy4MtTssjrEkM5_0AzmoPsERVFdekoXoqBBmCUg45cvfIbCsEqHWnszrCkUzjLwjBbLTGNTNclRaZisEyaWQt-qUnFb3tkNniaPg6DHX9C8NqPWBXA0Qy2kEQ7BV6aWaOUj5WTWgjJlWcFk7bw3EnF0kLHxkhdCOGlcRbDPC-TM9ItYdXnhDKvnGBW2aSw2N5FqyRySmiriwjMILkgPZTO4r1tkbHYC-byj-9X5BA11JbAXJNutan9DXj3ytw2Wv0ERbOmwA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF20HvSkYsVv9-A1MWl2k11vQZRW21JIC72V_UoRSyo1ufjrnUlaFUHwErK5ZJnZ5M3uzJtHyA03XBsW5h5zAV545MkIqwgToblIWGwS5DsPhnF3wp6mfLomq9dcGOdcXXzmfLytc_l2aSo8KoMvPOQiZMk22QHgZ7yha615v2Egb3vZKM0yiEA6iY-64NidECVxfwin1LjxuE-Gmzc25SKvflVq33z8asb47ykdkPY3RY-OvsDnkGy54ojMsqrAYM7d0ZRmYH3kRVFVWIrHYuADGGXgJVst3IpCuEoHCvszzOkYdrMwTBdzTCTTZUFRqxjWJk2NAWSqk_HvbTJ5fBjfd721goL30glY6cHmDH4ikbIScJoZLaULpRUqjgWXjuVMmNxxKyRLchVqLVQex05oazDQcyI6Jq0CZn1CKHPSxsxIE-UGG7woGQVWxsqoPICFEJ2SNlpn9tY0yZhtDHP2x_NrstsdD_qzfm_4fE720FtNQcwFaZWryl0C1pf6qvbwJy5jqg0 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Symposium+on+Performance+Analysis+of+Systems+and+Software+%28ISPASS%29&rft.atitle=Sunstone%3A+A+Scalable+and+Versatile+Scheduler+for+Mapping+Tensor+Algebra+on+Spatial+Accelerators&rft.au=Olyaiy%2C+MohammadHossein&rft.au=Ng%2C+Christopher&rft.au=Fedorova%2C+Alexandra+Sasha&rft.au=Lis%2C+Mieszko&rft.date=2023-04-01&rft.pub=IEEE&rft.spage=259&rft.epage=271&rft_id=info:doi/10.1109%2FISPASS57527.2023.00033&rft.externalDocID=10158147 |