An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs
Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a si...
Saved in:
Published in | IEEE transactions on very large scale integration (VLSI) systems Vol. 30; no. 11; pp. 1587 - 1600 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.11.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv<inline-formula> <tex-math notation="LaTeX">3\times 3 </tex-math></inline-formula> operator in the YOLOv3-tiny network to further improve the accelerator's efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a <inline-formula> <tex-math notation="LaTeX">4.10\times </tex-math></inline-formula> speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a <inline-formula> <tex-math notation="LaTeX">3.13\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">18.34\times </tex-math></inline-formula> normalized digital signal processor (DSP) efficiency and <inline-formula> <tex-math notation="LaTeX">1.10\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">14.2\times </tex-math></inline-formula> energy efficiency. |
---|---|
ISSN: | 1063-8210 1557-9999 |
DOI: | 10.1109/TVLSI.2022.3151788 |