Configurable Deep Learning Accelerator with Bitwise-accurate Training and Verification
This paper introduces an end-to-end solution to a deep neural network (DNN) inference system. We customize and enrich a family of deep-learning accelerators (DLA) based on the NVIDIA open-source deep-learning accelerator (NVDLA). Our exclusive enhancement includes hardware and software parts. The ha...
Saved in:
Published in | 2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) pp. 1 - 4 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
18.04.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper introduces an end-to-end solution to a deep neural network (DNN) inference system. We customize and enrich a family of deep-learning accelerators (DLA) based on the NVIDIA open-source deep-learning accelerator (NVDLA). Our exclusive enhancement includes hardware and software parts. The hardware part is the shared multiplier array of both high-efficient regular and depth-wise convolutions. The software part includes a new DLA toolchain that generates various specifications of DLA and provides corresponding tests. Users can either verify trained or untrained DNN graphs to simulate the accuracy and performance results. Considering the hard-to-debug accuracy loss in the integer inference, a hardware-aware and bitwise-accurate flow are proposed. The DLAs were verified using FPGA prototypes and silicon chips. The 65nm test chip has 256 convolutional multipliers, and the peak computing power is 200 GOPs, achieving 2.5TOPs/W peak energy efficiency. The average power consumption is 65mW when running an object detection DNN model. |
---|---|
ISSN: | 2472-9124 |
DOI: | 10.1109/VLSI-DAT54769.2022.9768062 |