Configurable Deep Learning Accelerator with Bitwise-accurate Training and Verification

This paper introduces an end-to-end solution to a deep neural network (DNN) inference system. We customize and enrich a family of deep-learning accelerators (DLA) based on the NVIDIA open-source deep-learning accelerator (NVDLA). Our exclusive enhancement includes hardware and software parts. The ha...

Full description

Saved in:

Bibliographic Details
Published in	2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) pp. 1 - 4
Main Authors	Luo, Shien-Chun, Chang, Kuo-Chiang, Chen, Po-Wei, Chen, Zhao-Hong
Format	Conference Proceeding
Language	English
Published	IEEE 18.04.2022
Subjects	Accelerator ASIC Compiler Deep learning FPGA Neural Network Neural networks Object detection Power demand Prototypes SoC Training Very large scale integration
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper introduces an end-to-end solution to a deep neural network (DNN) inference system. We customize and enrich a family of deep-learning accelerators (DLA) based on the NVIDIA open-source deep-learning accelerator (NVDLA). Our exclusive enhancement includes hardware and software parts. The hardware part is the shared multiplier array of both high-efficient regular and depth-wise convolutions. The software part includes a new DLA toolchain that generates various specifications of DLA and provides corresponding tests. Users can either verify trained or untrained DNN graphs to simulate the accuracy and performance results. Considering the hard-to-debug accuracy loss in the integer inference, a hardware-aware and bitwise-accurate flow are proposed. The DLAs were verified using FPGA prototypes and silicon chips. The 65nm test chip has 256 convolutional multipliers, and the peak computing power is 200 GOPs, achieving 2.5TOPs/W peak energy efficiency. The average power consumption is 65mW when running an object detection DNN model.
ISSN:	2472-9124
DOI:	10.1109/VLSI-DAT54769.2022.9768062