Design of fast and sparse accelerator for deep learning model based on FPGA

At present, there have been many studies to design various CNN hardware accelerators to accelerate the inference of deep neural network models. The FPGA-based CNN reasoning accelerator can provide sufficient computing power support with flexible data accuracy, lower energy consumption and lower appl...

Full description

Saved in:
Bibliographic Details
Main Authors Li, Shaotong, Long, Yuhang
Format Conference Proceeding
LanguageEnglish
Published SPIE 31.05.2023
Online AccessGet full text

Cover

Loading…
More Information
Summary:At present, there have been many studies to design various CNN hardware accelerators to accelerate the inference of deep neural network models. The FPGA-based CNN reasoning accelerator can provide sufficient computing power support with flexible data accuracy, lower energy consumption and lower application cost, and has received a lot of attention in the application field of IoT terminal devices with limited computing power and energy consumption. Widespread concern. However, although the current FPGA-based CNN accelerator has greatly improved the speed of model reasoning through various methods, most of the methods cannot be effectively applied to actual terminal scenarios due to limitations in memory and energy consumption. In response to this situation, we designed an acceleration framework that takes into account both inference acceleration and energy consumption. Aiming at the limitation of computing power in the terminal environment, optimize a large number of multiplication operations in the convolution operation that consumes the most computing power in the CNN inference stage, by using local cache and matrix transformation formulas, and skipping pairings by zero values in the calculation process the model inference operation is further accelerated while reducing energy consumption. The experimental results show that compared with the current advanced neural network accelerator, not only the computing power has been significantly improved, but also the energy efficiency ratio has achieved better results. Moreover, this method can not only be implemented in FPGA, but also be migrated to other embedded terminals.
Bibliography:Conference Date: 2023-02-17|2023-02-19
Conference Location: Hangzhou, China
ISBN:9781510666290
151066629X
ISSN:0277-786X
DOI:10.1117/12.2680554