Configurable Deep Learning Accelerator with Bitwise-accurate Training and Verification

This paper introduces an end-to-end solution to a deep neural network (DNN) inference system. We customize and enrich a family of deep-learning accelerators (DLA) based on the NVIDIA open-source deep-learning accelerator (NVDLA). Our exclusive enhancement includes hardware and software parts. The ha...

Full description

Saved in:
Bibliographic Details
Published in2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT) pp. 1 - 4
Main Authors Luo, Shien-Chun, Chang, Kuo-Chiang, Chen, Po-Wei, Chen, Zhao-Hong
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.04.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper introduces an end-to-end solution to a deep neural network (DNN) inference system. We customize and enrich a family of deep-learning accelerators (DLA) based on the NVIDIA open-source deep-learning accelerator (NVDLA). Our exclusive enhancement includes hardware and software parts. The hardware part is the shared multiplier array of both high-efficient regular and depth-wise convolutions. The software part includes a new DLA toolchain that generates various specifications of DLA and provides corresponding tests. Users can either verify trained or untrained DNN graphs to simulate the accuracy and performance results. Considering the hard-to-debug accuracy loss in the integer inference, a hardware-aware and bitwise-accurate flow are proposed. The DLAs were verified using FPGA prototypes and silicon chips. The 65nm test chip has 256 convolutional multipliers, and the peak computing power is 200 GOPs, achieving 2.5TOPs/W peak energy efficiency. The average power consumption is 65mW when running an object detection DNN model.
ISSN:2472-9124
DOI:10.1109/VLSI-DAT54769.2022.9768062