An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on very large scale integration (VLSI) systems Vol. 30; no. 11; pp. 1587 - 1600
Main Authors Li, Shengzhao, Wang, Qin, Jiang, Jianfei, Sheng, Weiguang, Jing, Naifeng, Mao, Zhigang
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv<inline-formula> <tex-math notation="LaTeX">3\times 3 </tex-math></inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a <inline-formula> <tex-math notation="LaTeX">4.10\times </tex-math></inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a <inline-formula> <tex-math notation="LaTeX">3.13\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">18.34\times </tex-math></inline-formula> normalized digital signal processor (DSP) efficiency and <inline-formula> <tex-math notation="LaTeX">1.10\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">14.2\times </tex-math></inline-formula> energy efficiency.
ISSN:1063-8210
1557-9999
DOI:10.1109/TVLSI.2022.3151788