HIPU: A Hybrid Intelligent Processing Unit With Fine-Grained ISA for Real-Time Deep Neural Network Inference Applications

Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, whi...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on very large scale integration (VLSI) systems Vol. 31; no. 12; pp. 1980 - 1993
Main Authors Zhao, Wenzhe, Yang, Guoming, Xia, Tian, Chen, Fei, Zheng, Nanning, Ren, Pengju
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1063-8210
1557-9999
DOI10.1109/TVLSI.2023.3327110

Cover

Loading…
Abstract Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply-accumulation (MAC) utilization for all common operators and achieves high performance with 4-<inline-formula> <tex-math notation="LaTeX">7\times </tex-math></inline-formula> speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS.
AbstractList Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply–accumulation (MAC) utilization for all common operators and achieves high performance with 4–[Formula Omitted] speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS.
Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply-accumulation (MAC) utilization for all common operators and achieves high performance with 4-<inline-formula> <tex-math notation="LaTeX">7\times </tex-math></inline-formula> speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS.
Author Chen, Fei
Zheng, Nanning
Xia, Tian
Ren, Pengju
Yang, Guoming
Zhao, Wenzhe
Author_xml – sequence: 1
  givenname: Wenzhe
  orcidid: 0000-0002-7001-2125
  surname: Zhao
  fullname: Zhao, Wenzhe
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China
– sequence: 2
  givenname: Guoming
  surname: Yang
  fullname: Yang, Guoming
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China
– sequence: 3
  givenname: Tian
  orcidid: 0000-0002-2520-3731
  surname: Xia
  fullname: Xia, Tian
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China
– sequence: 4
  givenname: Fei
  surname: Chen
  fullname: Chen, Fei
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China
– sequence: 5
  givenname: Nanning
  surname: Zheng
  fullname: Zheng, Nanning
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China
– sequence: 6
  givenname: Pengju
  orcidid: 0000-0003-1163-2014
  surname: Ren
  fullname: Ren, Pengju
  email: pengjuren@xjtu.edu.cn
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China
BookMark eNp9kDtPwzAUhS1UJNrCH0AMlphT7Dgvs1WFPqQKEKQwRk58U1xSJ9iuUP89hjIgBrwc6-qc-_gGqKdbDQidUzKilPCr_Hn5tBiFJGQjxsLU145Qn8ZxGnD_ev5PEhZkISUnaGDthhAaRZz00X6-eFhd4zGe70ujJF5oB02j1qAdfjBtBdYqvcYrrRx-Ue4VT5WGYGaEF-9-GuO6NfgRRBPkagv4BqDDd7AzovHiPlrz5nvWYEBXgMdd16hKONVqe4qOa9FYOPvRIcqnt_lkHizvZ4vJeBlUIU9ckJJIRoIKwpJSZgLKKiUlS1IRlzROJTAueRaWjCdMyqyStUyZEBRIxmTEUzZEl4e2nWnfd2BdsWl3RvuJRZjxmFAa0cS7woOrMq21BuqiM2orzL6gpPgiXHwTLr4IFz-EfSj7E6qU-z7OeT7N_9GLQ1QBwK9ZjDLq1_kEdz2LWg
CODEN IEVSE9
CitedBy_id crossref_primary_10_1109_TVLSI_2024_3466224
crossref_primary_10_1109_TVLSI_2025_3527225
Cites_doi 10.1109/AVSS.2019.8909903
10.1109/HOTCHIPS.2019.8875654
10.1109/ISCAS46773.2023.10181985
10.1109/CVPR.2018.00286
10.1109/MM.2020.2975764
10.1109/ASAP52443.2021.00046
10.1109/TC.2016.2574353
10.1109/ISCA45697.2020.00013
10.1109/JSSC.2022.3198505
10.1109/TVLSI.2019.2950087
10.1109/HPCA51647.2021.00071
10.1145/3007787.3001179
10.1145/3240765.3240855
10.1109/ICASID.2018.8693202
10.1145/2996864
10.1002/rob.21918
10.1145/3568310
10.1109/CVPRW50498.2020.00187
10.1145/3065386
10.1109/JSSC.2022.3214170
10.1109/HCS49909.2020.9220415
10.3390/s19020281
10.1145/3079856.3080246
10.1109/TVLSI.2019.2935251
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TVLSI.2023.3327110
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1557-9999
EndPage 1993
ExternalDocumentID 10_1109_TVLSI_2023_3327110
10313116
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62302381; 62088102
  funderid: 10.13039/501100001809
– fundername: Fundamental Research Funds for the Central Universities
  grantid: xtr072022001
  funderid: 10.13039/501100012226
– fundername: National Key Research and Development Program of China
  grantid: 2022YFB4500500
  funderid: 10.13039/501100012166
– fundername: Key Research and Development Projects of Shaanxi Province; Key Research and Development Program of Shaanxi
  grantid: 2022ZDLGY01-08
  funderid: 10.13039/501100015401
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYOK
AAYXX
CITATION
RIG
7SP
8FD
L7M
ID FETCH-LOGICAL-c296t-704d4a1a036bd8aebc70b367a5b157de39d982b3963dd8cdfd73aa1e083d4973
IEDL.DBID RIE
ISSN 1063-8210
IngestDate Mon Jun 30 06:35:47 EDT 2025
Tue Jul 01 02:17:51 EDT 2025
Thu Apr 24 22:51:13 EDT 2025
Wed Aug 27 02:37:31 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c296t-704d4a1a036bd8aebc70b367a5b157de39d982b3963dd8cdfd73aa1e083d4973
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-7001-2125
0000-0003-1163-2014
0000-0002-2520-3731
PQID 2895011416
PQPubID 85424
PageCount 14
ParticipantIDs crossref_citationtrail_10_1109_TVLSI_2023_3327110
ieee_primary_10313116
crossref_primary_10_1109_TVLSI_2023_3327110
proquest_journals_2895011416
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-12-01
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev TVLSI
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref31
ref30
ref11
ref10
ref1
ref17
ref16
ref19
ref18
Asanović (ref25) 2014
(ref8) 2022
Askarihemmat (ref32)
ref24
ref23
ref20
(ref14) 2022
ref22
ref21
Chen (ref27)
ref28
ref29
ref7
Schmidt (ref26)
ref9
ref4
ref3
ref6
ref5
Redmon (ref2) 2018
References_xml – ident: ref4
  doi: 10.1109/AVSS.2019.8909903
– ident: ref23
  doi: 10.1109/HOTCHIPS.2019.8875654
– ident: ref29
  doi: 10.1109/ISCAS46773.2023.10181985
– ident: ref24
  doi: 10.1109/CVPR.2018.00286
– ident: ref19
  doi: 10.1109/MM.2020.2975764
– ident: ref21
  doi: 10.1109/ASAP52443.2021.00046
– start-page: 483
  volume-title: Proc. 28th Asia South Pacific Design Autom. Conf. (ASP-DAC)
  ident: ref32
  article-title: BARVINN: Arbitrary precision DNN accelerator controlled by a RISC-V CPU
– ident: ref11
  doi: 10.1109/TC.2016.2574353
– ident: ref16
  doi: 10.1109/ISCA45697.2020.00013
– ident: ref20
  doi: 10.1109/JSSC.2022.3198505
– year: 2018
  ident: ref2
  article-title: YOLOv3: An incremental improvement
  publication-title: arXiv:1804.02767
– ident: ref31
  doi: 10.1109/TVLSI.2019.2950087
– ident: ref30
  doi: 10.1109/HPCA51647.2021.00071
– year: 2014
  ident: ref25
  article-title: Instruction sets should be free: The case for RISC-V
– ident: ref12
  doi: 10.1145/3007787.3001179
– start-page: 1
  volume-title: Proc. Inaugural RISC-V Summit
  ident: ref26
  article-title: Hwacha V4: Decoupled data parallel custom extension
– ident: ref9
  doi: 10.1145/3240765.3240855
– ident: ref6
  doi: 10.1109/ICASID.2018.8693202
– ident: ref13
  article-title: Intel architecture instruction set extensions programming reference
  publication-title: Intel Corp
– volume-title: Core ML: Integrate Machine Learningmodels Into Your App
  year: 2022
  ident: ref14
– ident: ref10
  doi: 10.1145/2996864
– ident: ref5
  doi: 10.1002/rob.21918
– ident: ref18
  doi: 10.1145/3568310
– ident: ref28
  doi: 10.1109/CVPRW50498.2020.00187
– ident: ref1
  doi: 10.1145/3065386
– ident: ref22
  doi: 10.1109/JSSC.2022.3214170
– volume-title: Imagination
  year: 2022
  ident: ref8
– start-page: 578
  volume-title: Proc. 13th USENIX Symp. Operating Syst. Design Implement. (OSDI)
  ident: ref27
  article-title: TVM: An automated end-to-end optimizing compiler for deep learning
– ident: ref15
  doi: 10.1109/HCS49909.2020.9220415
– ident: ref3
  doi: 10.3390/s19020281
– ident: ref7
  doi: 10.1145/3079856.3080246
– ident: ref17
  doi: 10.1109/TVLSI.2019.2935251
SSID ssj0014490
Score 2.4283621
Snippet Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1980
SubjectTerms Accelerators
Algorithms
Artificial neural networks
Convolution
Design optimization
Inference
Inference algorithms
Matrix converters
Network-on-chip (NoC)
neural network (NN) inference accelerating
Neural networks
Operators
out-of-order (OoO) superscalar processor
Performance enhancement
Real time
Real-time systems
reduced instruction set architecture
Schedules
Task analysis
Title HIPU: A Hybrid Intelligent Processing Unit With Fine-Grained ISA for Real-Time Deep Neural Network Inference Applications
URI https://ieeexplore.ieee.org/document/10313116
https://www.proquest.com/docview/2895011416
Volume 31
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4IJz34xIii2YM309Jlt9vWG1ERjBIjoNya3e4SjQaIlgP-emfaQohG461JZx_J7OObnflmCDllCTcjKWGnWaEcgZ5CrRIwXBVXDQn41hh80L_ryvZA3Az9YUFWz7gw1tos-My6-Jn58s0kmeFTWR1LEnDGZImUwHLLyVpLl4EQUZ56AIYLwZBZMGS8qN5_vO11XCwU7nLeCBjSZVduoaysyo-zOLtgWluku5haHlfy6s5S7Saf37I2_nvu22SzgJq0ma-NHbJmx7tkYyUB4R6Ztzv3g3PapO05MrdoZ5mgM6UFhQDkKAJT-vSSPtMWtHWusayEBelekwLmpQ8ANh3kktBLa6cU833AuN08wBz6LCiFtLniLK-Qfuuqf9F2imIMTtKIZOoEnjBCMQU3njahsjoJPM1loHzN_MBYHpkobGgOG9pgRaSRCbhSzALEMyIK-D4pjydje0ColkKyUEk4SeBPiK8eiQ4ZG3kB-o29KmEL3cRJkagc62W8xZnB4kVxps8Y9RkX-qySs2WbaZ6m40_pCipoRTLXTZXUFmsgLrbyRwwWqY9WI5OHvzQ7IuvYex7kUiPl9H1mjwGqpPokW6Jf-CLiUg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT-MwEB4tcAAOvBEFFnzYG0qIa8dJuFVASaFUKyiPW2THrnYFKgjSA_x6ZpK0qliB9hYp44c0fsx45vsG4BfPhR0ohTvNSe1JihQanaPjqoVuKrRvraUH_cueSm_k-X14X4PVSyyMc65MPnM-fZaxfPuUj-ip7JBKEgjO1QzMhYTGreBak6CBlElFPoADxujKjDEyQXLYv-1ed3wqFe4L0Yw4AWan7qGysMo_p3F5xbSXoTeeXJVZ8uCPCuPn7594G_979iuwVBubrFWtjlX44YZrsDhFQbgOb2nn980Ra7H0jbBbrDOh6CxYDSJAOUamKbv7W_xhbWzrnVFhCYfS1y2GVi-7QnPTIzQJO3HumRHjB47bq1LMsc8aVMhaU-HyDei3T_vHqVeXY_DyZqIKLwqklZprvPOMjbUzeRQYoSIdGh5G1onEJnHTCNzSlmoiDWwktOYOjTwrk0hswuzwaei2gBklFY-1wrME_8T07pGbmPNBEFHkOGgAH-smy2uqcqqY8ZiVLkuQZKU-M9JnVuuzAQeTNs8VUce30hukoCnJSjcN2B2vgazezK8Z-qQh-Y1cbX_RbB_m0_5lN-t2ehc7sEAjVSkvuzBbvIzcTzRcCrNXLtcP7iXlmg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HIPU%3A+A+Hybrid+Intelligent+Processing+Unit+With+Fine-Grained+ISA+for+Real-Time+Deep+Neural+Network+Inference+Applications&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Zhao%2C+Wenzhe&rft.au=Yang%2C+Guoming&rft.au=Xia%2C+Tian&rft.au=Chen%2C+Fei&rft.date=2023-12-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=31&rft.issue=12&rft.spage=1980&rft.epage=1993&rft_id=info:doi/10.1109%2FTVLSI.2023.3327110&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2023_3327110
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon