HIPU: A Hybrid Intelligent Processing Unit With Fine-Grained ISA for Real-Time Deep Neural Network Inference Applications
Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, whi...
Saved in:
Published in | IEEE transactions on very large scale integration (VLSI) systems Vol. 31; no. 12; pp. 1980 - 1993 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.12.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 1063-8210 1557-9999 |
DOI | 10.1109/TVLSI.2023.3327110 |
Cover
Loading…
Abstract | Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply-accumulation (MAC) utilization for all common operators and achieves high performance with 4-<inline-formula> <tex-math notation="LaTeX">7\times </tex-math></inline-formula> speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS. |
---|---|
AbstractList | Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply–accumulation (MAC) utilization for all common operators and achieves high performance with 4–[Formula Omitted] speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS. Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply-accumulation (MAC) utilization for all common operators and achieves high performance with 4-<inline-formula> <tex-math notation="LaTeX">7\times </tex-math></inline-formula> speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS. |
Author | Chen, Fei Zheng, Nanning Xia, Tian Ren, Pengju Yang, Guoming Zhao, Wenzhe |
Author_xml | – sequence: 1 givenname: Wenzhe orcidid: 0000-0002-7001-2125 surname: Zhao fullname: Zhao, Wenzhe organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China – sequence: 2 givenname: Guoming surname: Yang fullname: Yang, Guoming organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China – sequence: 3 givenname: Tian orcidid: 0000-0002-2520-3731 surname: Xia fullname: Xia, Tian organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China – sequence: 4 givenname: Fei surname: Chen fullname: Chen, Fei organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China – sequence: 5 givenname: Nanning surname: Zheng fullname: Zheng, Nanning organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China – sequence: 6 givenname: Pengju orcidid: 0000-0003-1163-2014 surname: Ren fullname: Ren, Pengju email: pengjuren@xjtu.edu.cn organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, the National Engineering Research Center of Visual Information and Applications, and the Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, Shaanxi, China |
BookMark | eNp9kDtPwzAUhS1UJNrCH0AMlphT7Dgvs1WFPqQKEKQwRk58U1xSJ9iuUP89hjIgBrwc6-qc-_gGqKdbDQidUzKilPCr_Hn5tBiFJGQjxsLU145Qn8ZxGnD_ev5PEhZkISUnaGDthhAaRZz00X6-eFhd4zGe70ujJF5oB02j1qAdfjBtBdYqvcYrrRx-Ue4VT5WGYGaEF-9-GuO6NfgRRBPkagv4BqDDd7AzovHiPlrz5nvWYEBXgMdd16hKONVqe4qOa9FYOPvRIcqnt_lkHizvZ4vJeBlUIU9ckJJIRoIKwpJSZgLKKiUlS1IRlzROJTAueRaWjCdMyqyStUyZEBRIxmTEUzZEl4e2nWnfd2BdsWl3RvuJRZjxmFAa0cS7woOrMq21BuqiM2orzL6gpPgiXHwTLr4IFz-EfSj7E6qU-z7OeT7N_9GLQ1QBwK9ZjDLq1_kEdz2LWg |
CODEN | IEVSE9 |
CitedBy_id | crossref_primary_10_1109_TVLSI_2024_3466224 crossref_primary_10_1109_TVLSI_2025_3527225 |
Cites_doi | 10.1109/AVSS.2019.8909903 10.1109/HOTCHIPS.2019.8875654 10.1109/ISCAS46773.2023.10181985 10.1109/CVPR.2018.00286 10.1109/MM.2020.2975764 10.1109/ASAP52443.2021.00046 10.1109/TC.2016.2574353 10.1109/ISCA45697.2020.00013 10.1109/JSSC.2022.3198505 10.1109/TVLSI.2019.2950087 10.1109/HPCA51647.2021.00071 10.1145/3007787.3001179 10.1145/3240765.3240855 10.1109/ICASID.2018.8693202 10.1145/2996864 10.1002/rob.21918 10.1145/3568310 10.1109/CVPRW50498.2020.00187 10.1145/3065386 10.1109/JSSC.2022.3214170 10.1109/HCS49909.2020.9220415 10.3390/s19020281 10.1145/3079856.3080246 10.1109/TVLSI.2019.2935251 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
DOI | 10.1109/TVLSI.2023.3327110 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1557-9999 |
EndPage | 1993 |
ExternalDocumentID | 10_1109_TVLSI_2023_3327110 10313116 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 62302381; 62088102 funderid: 10.13039/501100001809 – fundername: Fundamental Research Funds for the Central Universities grantid: xtr072022001 funderid: 10.13039/501100012226 – fundername: National Key Research and Development Program of China grantid: 2022YFB4500500 funderid: 10.13039/501100012166 – fundername: Key Research and Development Projects of Shaanxi Province; Key Research and Development Program of Shaanxi grantid: 2022ZDLGY01-08 funderid: 10.13039/501100015401 |
GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYOK AAYXX CITATION RIG 7SP 8FD L7M |
ID | FETCH-LOGICAL-c296t-704d4a1a036bd8aebc70b367a5b157de39d982b3963dd8cdfd73aa1e083d4973 |
IEDL.DBID | RIE |
ISSN | 1063-8210 |
IngestDate | Mon Jun 30 06:35:47 EDT 2025 Tue Jul 01 02:17:51 EDT 2025 Thu Apr 24 22:51:13 EDT 2025 Wed Aug 27 02:37:31 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 12 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c296t-704d4a1a036bd8aebc70b367a5b157de39d982b3963dd8cdfd73aa1e083d4973 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-7001-2125 0000-0003-1163-2014 0000-0002-2520-3731 |
PQID | 2895011416 |
PQPubID | 85424 |
PageCount | 14 |
ParticipantIDs | crossref_citationtrail_10_1109_TVLSI_2023_3327110 ieee_primary_10313116 crossref_primary_10_1109_TVLSI_2023_3327110 proquest_journals_2895011416 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-12-01 |
PublicationDateYYYYMMDD | 2023-12-01 |
PublicationDate_xml | – month: 12 year: 2023 text: 2023-12-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on very large scale integration (VLSI) systems |
PublicationTitleAbbrev | TVLSI |
PublicationYear | 2023 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 ref31 ref30 ref11 ref10 ref1 ref17 ref16 ref19 ref18 Asanović (ref25) 2014 (ref8) 2022 Askarihemmat (ref32) ref24 ref23 ref20 (ref14) 2022 ref22 ref21 Chen (ref27) ref28 ref29 ref7 Schmidt (ref26) ref9 ref4 ref3 ref6 ref5 Redmon (ref2) 2018 |
References_xml | – ident: ref4 doi: 10.1109/AVSS.2019.8909903 – ident: ref23 doi: 10.1109/HOTCHIPS.2019.8875654 – ident: ref29 doi: 10.1109/ISCAS46773.2023.10181985 – ident: ref24 doi: 10.1109/CVPR.2018.00286 – ident: ref19 doi: 10.1109/MM.2020.2975764 – ident: ref21 doi: 10.1109/ASAP52443.2021.00046 – start-page: 483 volume-title: Proc. 28th Asia South Pacific Design Autom. Conf. (ASP-DAC) ident: ref32 article-title: BARVINN: Arbitrary precision DNN accelerator controlled by a RISC-V CPU – ident: ref11 doi: 10.1109/TC.2016.2574353 – ident: ref16 doi: 10.1109/ISCA45697.2020.00013 – ident: ref20 doi: 10.1109/JSSC.2022.3198505 – year: 2018 ident: ref2 article-title: YOLOv3: An incremental improvement publication-title: arXiv:1804.02767 – ident: ref31 doi: 10.1109/TVLSI.2019.2950087 – ident: ref30 doi: 10.1109/HPCA51647.2021.00071 – year: 2014 ident: ref25 article-title: Instruction sets should be free: The case for RISC-V – ident: ref12 doi: 10.1145/3007787.3001179 – start-page: 1 volume-title: Proc. Inaugural RISC-V Summit ident: ref26 article-title: Hwacha V4: Decoupled data parallel custom extension – ident: ref9 doi: 10.1145/3240765.3240855 – ident: ref6 doi: 10.1109/ICASID.2018.8693202 – ident: ref13 article-title: Intel architecture instruction set extensions programming reference publication-title: Intel Corp – volume-title: Core ML: Integrate Machine Learningmodels Into Your App year: 2022 ident: ref14 – ident: ref10 doi: 10.1145/2996864 – ident: ref5 doi: 10.1002/rob.21918 – ident: ref18 doi: 10.1145/3568310 – ident: ref28 doi: 10.1109/CVPRW50498.2020.00187 – ident: ref1 doi: 10.1145/3065386 – ident: ref22 doi: 10.1109/JSSC.2022.3214170 – volume-title: Imagination year: 2022 ident: ref8 – start-page: 578 volume-title: Proc. 13th USENIX Symp. Operating Syst. Design Implement. (OSDI) ident: ref27 article-title: TVM: An automated end-to-end optimizing compiler for deep learning – ident: ref15 doi: 10.1109/HCS49909.2020.9220415 – ident: ref3 doi: 10.3390/s19020281 – ident: ref7 doi: 10.1145/3079856.3080246 – ident: ref17 doi: 10.1109/TVLSI.2019.2935251 |
SSID | ssj0014490 |
Score | 2.4283621 |
Snippet | Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1980 |
SubjectTerms | Accelerators Algorithms Artificial neural networks Convolution Design optimization Inference Inference algorithms Matrix converters Network-on-chip (NoC) neural network (NN) inference accelerating Neural networks Operators out-of-order (OoO) superscalar processor Performance enhancement Real time Real-time systems reduced instruction set architecture Schedules Task analysis |
Title | HIPU: A Hybrid Intelligent Processing Unit With Fine-Grained ISA for Real-Time Deep Neural Network Inference Applications |
URI | https://ieeexplore.ieee.org/document/10313116 https://www.proquest.com/docview/2895011416 |
Volume | 31 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4IJz34xIii2YM309Jlt9vWG1ERjBIjoNya3e4SjQaIlgP-emfaQohG461JZx_J7OObnflmCDllCTcjKWGnWaEcgZ5CrRIwXBVXDQn41hh80L_ryvZA3Az9YUFWz7gw1tos-My6-Jn58s0kmeFTWR1LEnDGZImUwHLLyVpLl4EQUZ56AIYLwZBZMGS8qN5_vO11XCwU7nLeCBjSZVduoaysyo-zOLtgWluku5haHlfy6s5S7Saf37I2_nvu22SzgJq0ma-NHbJmx7tkYyUB4R6Ztzv3g3PapO05MrdoZ5mgM6UFhQDkKAJT-vSSPtMWtHWusayEBelekwLmpQ8ANh3kktBLa6cU833AuN08wBz6LCiFtLniLK-Qfuuqf9F2imIMTtKIZOoEnjBCMQU3njahsjoJPM1loHzN_MBYHpkobGgOG9pgRaSRCbhSzALEMyIK-D4pjydje0ColkKyUEk4SeBPiK8eiQ4ZG3kB-o29KmEL3cRJkagc62W8xZnB4kVxps8Y9RkX-qySs2WbaZ6m40_pCipoRTLXTZXUFmsgLrbyRwwWqY9WI5OHvzQ7IuvYex7kUiPl9H1mjwGqpPokW6Jf-CLiUg |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT-MwEB4tcAAOvBEFFnzYG0qIa8dJuFVASaFUKyiPW2THrnYFKgjSA_x6ZpK0qliB9hYp44c0fsx45vsG4BfPhR0ohTvNSe1JihQanaPjqoVuKrRvraUH_cueSm_k-X14X4PVSyyMc65MPnM-fZaxfPuUj-ip7JBKEgjO1QzMhYTGreBak6CBlElFPoADxujKjDEyQXLYv-1ed3wqFe4L0Yw4AWan7qGysMo_p3F5xbSXoTeeXJVZ8uCPCuPn7594G_979iuwVBubrFWtjlX44YZrsDhFQbgOb2nn980Ra7H0jbBbrDOh6CxYDSJAOUamKbv7W_xhbWzrnVFhCYfS1y2GVi-7QnPTIzQJO3HumRHjB47bq1LMsc8aVMhaU-HyDei3T_vHqVeXY_DyZqIKLwqklZprvPOMjbUzeRQYoSIdGh5G1onEJnHTCNzSlmoiDWwktOYOjTwrk0hswuzwaei2gBklFY-1wrME_8T07pGbmPNBEFHkOGgAH-smy2uqcqqY8ZiVLkuQZKU-M9JnVuuzAQeTNs8VUce30hukoCnJSjcN2B2vgazezK8Z-qQh-Y1cbX_RbB_m0_5lN-t2ehc7sEAjVSkvuzBbvIzcTzRcCrNXLtcP7iXlmg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HIPU%3A+A+Hybrid+Intelligent+Processing+Unit+With+Fine-Grained+ISA+for+Real-Time+Deep+Neural+Network+Inference+Applications&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Zhao%2C+Wenzhe&rft.au=Yang%2C+Guoming&rft.au=Xia%2C+Tian&rft.au=Chen%2C+Fei&rft.date=2023-12-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=31&rft.issue=12&rft.spage=1980&rft.epage=1993&rft_id=info:doi/10.1109%2FTVLSI.2023.3327110&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2023_3327110 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon |