PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access...

Full description

Saved in:

Bibliographic Details
Main Authors	Ma, Zixuan, Wang, Haojie, Xing, Jingze, Zheng, Liyan, Zhang, Chen, Cao, Huanqi, Huang, Kezhao, Tang, Shizhi, Wang, Penghan, Zhai, Jidong
Format	Journal Article
Language	English
Published	10.07.2023
Subjects	Computer Science - Learning Computer Science - Programming Languages
Online Access	Get full text

Cover

Loading…

Abstract	Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
AbstractList	Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
Author	Tang, Shizhi Zheng, Liyan Xing, Jingze Ma, Zixuan Wang, Haojie Cao, Huanqi Zhai, Jidong Zhang, Chen Wang, Penghan Huang, Kezhao
Author_xml	– sequence: 1 givenname: Zixuan surname: Ma fullname: Ma, Zixuan – sequence: 2 givenname: Haojie surname: Wang fullname: Wang, Haojie – sequence: 3 givenname: Jingze surname: Xing fullname: Xing, Jingze – sequence: 4 givenname: Liyan surname: Zheng fullname: Zheng, Liyan – sequence: 5 givenname: Chen surname: Zhang fullname: Zhang, Chen – sequence: 6 givenname: Huanqi surname: Cao fullname: Cao, Huanqi – sequence: 7 givenname: Kezhao surname: Huang fullname: Huang, Kezhao – sequence: 8 givenname: Shizhi surname: Tang fullname: Tang, Shizhi – sequence: 9 givenname: Penghan surname: Wang fullname: Wang, Penghan – sequence: 10 givenname: Jidong surname: Zhai fullname: Zhai, Jidong
BackLink	https://doi.org/10.48550/arXiv.2307.04995$$DView paper in arXiv
BookMark	eNotj71OwzAYRT3AAIUHYMIvkBD_O2xVaEukIhDKHrn2F9VS4kROmpa3py1MV2e4Rzr36Cb0ARB6IlnKtRDZi4knP6eUZSrNeJ6LOwRf_RHi-jD6PrziJa4gjH3ERd8NvoWIj37a49VpaL31E34zk8Ef_QwdhDPBaKMfpvMVm-BwGcYpHuyFkxZmaPEmmmGPy-8HdNuYdoTH_12gar2qivdk-7kpi-U2MVKJhDlNQDolFXPEEt4IaTlxOyG0BLqTrnFMUWa04o0zmhOqwWmeU8KBWifZAj3_aa-d9RB9Z-JPfemtr73sF2gDUrw
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2307.04995
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2307_04995
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a675-3d81e6d7673d1c14f56c41db5586e2b6dfd3723a874fda84128ed849214e2cd63
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:38:40 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a675-3d81e6d7673d1c14f56c41db5586e2b6dfd3723a874fda84128ed849214e2cd63
OpenAccessLink	https://arxiv.org/abs/2307.04995
ParticipantIDs	arxiv_primary_2307_04995
PublicationCentury	2000
PublicationDate	2023-07-10
PublicationDateYYYYMMDD	2023-07-10
PublicationDate_xml	– month: 07 year: 2023 text: 2023-07-10 day: 10
PublicationDecade	2020
PublicationYear	2023
Score	1.8860539
SecondaryResourceType	preprint
Snippet	Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning Computer Science - Programming Languages
Title	PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
URI	https://arxiv.org/abs/2307.04995
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEA27e_Iiisr6yRy8Vps0SbPeFnU_BD-QCntb0iSFBdmVuoo_35m0ul48tg2BvjDMm-nrG8bOhQyikiEklfFYoFjkcLbMZeJT75TSzqQuqi0e9ORF3s3UrMPg518YW38tPht_4PL9klTKF0TKVZd1hSDJ1vhx1nycjFZc7frNOuSY8dafJDHaYdstu4Nhcxy7rBOWeyw80SSy0Qf1pa5gCAVWjqsaKBIxJGugViiQFm7hFmu4sWsL96to441X4TesAWt-mG4MX5NX0vvAmBynYfq8z4rRbXE9SdrpBolFkp5k3vCgfa7zzHPHZYXASO5LpYwOotS-8lkuMmtyWXlrJOaR4I0cCI7oOq-zA9ZbrpahzyDTuEOJTAO3wmoXE_hAKFOpNMWYcpwfsn7EZP7WGFjMCa55hOvo_0fHbItGq1Mfk6cnrIdvF04xAa_Ls3gK31k9hgw
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PowerFusion%3A+A+Tensor+Compiler+with+Explicit+Data+Movement+Description+and+Instruction-level+Graph+IR&rft.au=Ma%2C+Zixuan&rft.au=Wang%2C+Haojie&rft.au=Xing%2C+Jingze&rft.au=Zheng%2C+Liyan&rft.date=2023-07-10&rft_id=info:doi/10.48550%2Farxiv.2307.04995&rft.externalDocID=2307_04995