PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access...

Full description

Saved in:
Bibliographic Details
Main Authors Ma, Zixuan, Wang, Haojie, Xing, Jingze, Zheng, Liyan, Zhang, Chen, Cao, Huanqi, Huang, Kezhao, Tang, Shizhi, Wang, Penghan, Zhai, Jidong
Format Journal Article
LanguageEnglish
Published 10.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
AbstractList Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
Author Tang, Shizhi
Zheng, Liyan
Xing, Jingze
Ma, Zixuan
Wang, Haojie
Cao, Huanqi
Zhai, Jidong
Zhang, Chen
Wang, Penghan
Huang, Kezhao
Author_xml – sequence: 1
  givenname: Zixuan
  surname: Ma
  fullname: Ma, Zixuan
– sequence: 2
  givenname: Haojie
  surname: Wang
  fullname: Wang, Haojie
– sequence: 3
  givenname: Jingze
  surname: Xing
  fullname: Xing, Jingze
– sequence: 4
  givenname: Liyan
  surname: Zheng
  fullname: Zheng, Liyan
– sequence: 5
  givenname: Chen
  surname: Zhang
  fullname: Zhang, Chen
– sequence: 6
  givenname: Huanqi
  surname: Cao
  fullname: Cao, Huanqi
– sequence: 7
  givenname: Kezhao
  surname: Huang
  fullname: Huang, Kezhao
– sequence: 8
  givenname: Shizhi
  surname: Tang
  fullname: Tang, Shizhi
– sequence: 9
  givenname: Penghan
  surname: Wang
  fullname: Wang, Penghan
– sequence: 10
  givenname: Jidong
  surname: Zhai
  fullname: Zhai, Jidong
BackLink https://doi.org/10.48550/arXiv.2307.04995$$DView paper in arXiv
BookMark eNotj71OwzAYRT3AAIUHYMIvkBD_O2xVaEukIhDKHrn2F9VS4kROmpa3py1MV2e4Rzr36Cb0ARB6IlnKtRDZi4knP6eUZSrNeJ6LOwRf_RHi-jD6PrziJa4gjH3ERd8NvoWIj37a49VpaL31E34zk8Ef_QwdhDPBaKMfpvMVm-BwGcYpHuyFkxZmaPEmmmGPy-8HdNuYdoTH_12gar2qivdk-7kpi-U2MVKJhDlNQDolFXPEEt4IaTlxOyG0BLqTrnFMUWa04o0zmhOqwWmeU8KBWifZAj3_aa-d9RB9Z-JPfemtr73sF2gDUrw
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2307.04995
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2307_04995
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a675-3d81e6d7673d1c14f56c41db5586e2b6dfd3723a874fda84128ed849214e2cd63
IEDL.DBID GOX
IngestDate Mon Jan 08 05:38:40 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a675-3d81e6d7673d1c14f56c41db5586e2b6dfd3723a874fda84128ed849214e2cd63
OpenAccessLink https://arxiv.org/abs/2307.04995
ParticipantIDs arxiv_primary_2307_04995
PublicationCentury 2000
PublicationDate 2023-07-10
PublicationDateYYYYMMDD 2023-07-10
PublicationDate_xml – month: 07
  year: 2023
  text: 2023-07-10
  day: 10
PublicationDecade 2020
PublicationYear 2023
Score 1.8860539
SecondaryResourceType preprint
Snippet Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Computer Science - Programming Languages
Title PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
URI https://arxiv.org/abs/2307.04995
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEA27e_Iiisr6yRy8Vps0SbPeFnU_BD-QCntb0iSFBdmVuoo_35m0ul48tg2BvjDMm-nrG8bOhQyikiEklfFYoFjkcLbMZeJT75TSzqQuqi0e9ORF3s3UrMPg518YW38tPht_4PL9klTKF0TKVZd1hSDJ1vhx1nycjFZc7frNOuSY8dafJDHaYdstu4Nhcxy7rBOWeyw80SSy0Qf1pa5gCAVWjqsaKBIxJGugViiQFm7hFmu4sWsL96to441X4TesAWt-mG4MX5NX0vvAmBynYfq8z4rRbXE9SdrpBolFkp5k3vCgfa7zzHPHZYXASO5LpYwOotS-8lkuMmtyWXlrJOaR4I0cCI7oOq-zA9ZbrpahzyDTuEOJTAO3wmoXE_hAKFOpNMWYcpwfsn7EZP7WGFjMCa55hOvo_0fHbItGq1Mfk6cnrIdvF04xAa_Ls3gK31k9hgw
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PowerFusion%3A+A+Tensor+Compiler+with+Explicit+Data+Movement+Description+and+Instruction-level+Graph+IR&rft.au=Ma%2C+Zixuan&rft.au=Wang%2C+Haojie&rft.au=Xing%2C+Jingze&rft.au=Zheng%2C+Liyan&rft.date=2023-07-10&rft_id=info:doi/10.48550%2Farxiv.2307.04995&rft.externalDocID=2307_04995