PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Deep neural networks (DNNs) are of critical use in different domains. To
accelerate DNN computation, tensor compilers are proposed to generate efficient
code on different domain-specific accelerators. Existing tensor compilers
mainly focus on optimizing computation efficiency. However, memory access is
becoming a key performance bottleneck because the computational performance of
accelerators is increasing much faster than memory performance. The lack of
direct description of memory access and data dependence in current tensor
compilers' intermediate representation (IR) brings significant challenges to
generate memory-efficient code.
In this paper, we propose IntelliGen, a tensor compiler that can generate
high-performance code for memory-intensive operators by considering both
computation and data movement optimizations. IntelliGen represent a DNN program
using GIR, which includes primitives indicating its computation, data movement,
and parallel strategies. This information will be further composed as an
instruction-level dataflow graph to perform holistic optimizations by searching
different memory access patterns and computation operations, and generating
memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA
GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and
16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current
most performant frameworks. |
---|---|
AbstractList | Deep neural networks (DNNs) are of critical use in different domains. To
accelerate DNN computation, tensor compilers are proposed to generate efficient
code on different domain-specific accelerators. Existing tensor compilers
mainly focus on optimizing computation efficiency. However, memory access is
becoming a key performance bottleneck because the computational performance of
accelerators is increasing much faster than memory performance. The lack of
direct description of memory access and data dependence in current tensor
compilers' intermediate representation (IR) brings significant challenges to
generate memory-efficient code.
In this paper, we propose IntelliGen, a tensor compiler that can generate
high-performance code for memory-intensive operators by considering both
computation and data movement optimizations. IntelliGen represent a DNN program
using GIR, which includes primitives indicating its computation, data movement,
and parallel strategies. This information will be further composed as an
instruction-level dataflow graph to perform holistic optimizations by searching
different memory access patterns and computation operations, and generating
memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA
GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and
16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current
most performant frameworks. |
Author | Tang, Shizhi Zheng, Liyan Xing, Jingze Ma, Zixuan Wang, Haojie Cao, Huanqi Zhai, Jidong Zhang, Chen Wang, Penghan Huang, Kezhao |
Author_xml | – sequence: 1 givenname: Zixuan surname: Ma fullname: Ma, Zixuan – sequence: 2 givenname: Haojie surname: Wang fullname: Wang, Haojie – sequence: 3 givenname: Jingze surname: Xing fullname: Xing, Jingze – sequence: 4 givenname: Liyan surname: Zheng fullname: Zheng, Liyan – sequence: 5 givenname: Chen surname: Zhang fullname: Zhang, Chen – sequence: 6 givenname: Huanqi surname: Cao fullname: Cao, Huanqi – sequence: 7 givenname: Kezhao surname: Huang fullname: Huang, Kezhao – sequence: 8 givenname: Shizhi surname: Tang fullname: Tang, Shizhi – sequence: 9 givenname: Penghan surname: Wang fullname: Wang, Penghan – sequence: 10 givenname: Jidong surname: Zhai fullname: Zhai, Jidong |
BackLink | https://doi.org/10.48550/arXiv.2307.04995$$DView paper in arXiv |
BookMark | eNotj71OwzAYRT3AAIUHYMIvkBD_O2xVaEukIhDKHrn2F9VS4kROmpa3py1MV2e4Rzr36Cb0ARB6IlnKtRDZi4knP6eUZSrNeJ6LOwRf_RHi-jD6PrziJa4gjH3ERd8NvoWIj37a49VpaL31E34zk8Ef_QwdhDPBaKMfpvMVm-BwGcYpHuyFkxZmaPEmmmGPy-8HdNuYdoTH_12gar2qivdk-7kpi-U2MVKJhDlNQDolFXPEEt4IaTlxOyG0BLqTrnFMUWa04o0zmhOqwWmeU8KBWifZAj3_aa-d9RB9Z-JPfemtr73sF2gDUrw |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2307.04995 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2307_04995 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a675-3d81e6d7673d1c14f56c41db5586e2b6dfd3723a874fda84128ed849214e2cd63 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:38:40 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a675-3d81e6d7673d1c14f56c41db5586e2b6dfd3723a874fda84128ed849214e2cd63 |
OpenAccessLink | https://arxiv.org/abs/2307.04995 |
ParticipantIDs | arxiv_primary_2307_04995 |
PublicationCentury | 2000 |
PublicationDate | 2023-07-10 |
PublicationDateYYYYMMDD | 2023-07-10 |
PublicationDate_xml | – month: 07 year: 2023 text: 2023-07-10 day: 10 |
PublicationDecade | 2020 |
PublicationYear | 2023 |
Score | 1.8860539 |
SecondaryResourceType | preprint |
Snippet | Deep neural networks (DNNs) are of critical use in different domains. To
accelerate DNN computation, tensor compilers are proposed to generate efficient
code... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Computer Science - Programming Languages |
Title | PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR |
URI | https://arxiv.org/abs/2307.04995 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEA27e_Iiisr6yRy8Vps0SbPeFnU_BD-QCntb0iSFBdmVuoo_35m0ul48tg2BvjDMm-nrG8bOhQyikiEklfFYoFjkcLbMZeJT75TSzqQuqi0e9ORF3s3UrMPg518YW38tPht_4PL9klTKF0TKVZd1hSDJ1vhx1nycjFZc7frNOuSY8dafJDHaYdstu4Nhcxy7rBOWeyw80SSy0Qf1pa5gCAVWjqsaKBIxJGugViiQFm7hFmu4sWsL96to441X4TesAWt-mG4MX5NX0vvAmBynYfq8z4rRbXE9SdrpBolFkp5k3vCgfa7zzHPHZYXASO5LpYwOotS-8lkuMmtyWXlrJOaR4I0cCI7oOq-zA9ZbrpahzyDTuEOJTAO3wmoXE_hAKFOpNMWYcpwfsn7EZP7WGFjMCa55hOvo_0fHbItGq1Mfk6cnrIdvF04xAa_Ls3gK31k9hgw |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PowerFusion%3A+A+Tensor+Compiler+with+Explicit+Data+Movement+Description+and+Instruction-level+Graph+IR&rft.au=Ma%2C+Zixuan&rft.au=Wang%2C+Haojie&rft.au=Xing%2C+Jingze&rft.au=Zheng%2C+Liyan&rft.date=2023-07-10&rft_id=info:doi/10.48550%2Farxiv.2307.04995&rft.externalDocID=2307_04995 |