An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 30; no. 11; pp. 1587 - 1600
Main Authors	Li, Shengzhao, Wang, Qin, Jiang, Jianfei, Sheng, Weiguang, Jing, Naifeng, Mao, Zhigang
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accelerators Algorithms Application specific integrated circuits Artificial neural networks Computer vision Convolution Convolutional neural network (CNN) Convolutional neural networks Digital signal processing Digital signal processors Efficiency Field programmable gate arrays field-programmable gate array (FPGA) accelerator incremental operation input similarity Microprocessors Neural networks Operators (mathematics) Quantization (signal) Signal processing algorithms video applications Videos Winograd algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv<inline-formula> <tex-math notation="LaTeX">3\times 3 </tex-math></inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a <inline-formula> <tex-math notation="LaTeX">4.10\times </tex-math></inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a <inline-formula> <tex-math notation="LaTeX">3.13\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">18.34\times </tex-math></inline-formula> normalized digital signal processor (DSP) efficiency and <inline-formula> <tex-math notation="LaTeX">1.10\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">14.2\times </tex-math></inline-formula> energy efficiency.
ISSN:	1063-8210 1557-9999
DOI:	10.1109/TVLSI.2022.3151788