HIPU: A Hybrid Intelligent Processing Unit With Fine-Grained ISA for Real-Time Deep Neural Network Inference Applications

Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, whi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 31; no. 12; pp. 1980 - 1993
Main Authors	Zhao, Wenzhe, Yang, Guoming, Xia, Tian, Chen, Fei, Zheng, Nanning, Ren, Pengju
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerators Algorithms Artificial neural networks Convolution Design optimization Inference Inference algorithms Matrix converters Network-on-chip (NoC) neural network (NN) inference accelerating Neural networks Operators out-of-order (OoO) superscalar processor Performance enhancement Real time Real-time systems reduced instruction set architecture Schedules Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Neural network algorithms have shown superior performance over conventional algorithms, leading to the designation and deployment of dedicated accelerators in practical scenarios. Coarse-grained accelerators achieve high performance but can support only a limited number of predesigned operators, which cannot cover the flexible operators emerging in modern neural network algorithms. Therefore, fine-grained accelerators, such as instruction set architecture (ISA)-based accelerators, have become a hot research topic due to their sufficient flexibility to cover the unpredefined operators. The main challenges for fine-grained accelerators include the undesired long delays of single-image inference when performing multibatch inference, as well as the difficulty of meeting real-time constraints when processing multiple tasks simultaneously. This article proposes a hybrid intelligent processing unit (HIPU) to address the aforementioned problems. Specifically, we design a novel conversion-free data format, expanding the single-instruction multiple-data (SIMD) instruction set and optimizing the microarchitecture design to improve the performance. We also arrange the inference schedule to guarantee scalability on multicores. The experimental results show that the proposed accelerator maintains high multiply-accumulation (MAC) utilization for all common operators and achieves high performance with 4-<inline-formula> <tex-math notation="LaTeX">7\times </tex-math></inline-formula> speedup against NVIDIA RTX2080Ti GPU. Finally, the proposed accelerator is manufactured using TSMC 28-nm technology, achieving 1 GHz for each core, with a peak performance of 13 TOPS.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2023.3327110