An Overlap-and-Add Based Time Domain Acceleration of CNNs on FPGA-CPU Systems

Convolutional neural networks (CNNs) have become widespread in the area of image recognition and are widely implemented in modern facial recognition systems. With the increasing use of CNNs, their run-time speed becomes critical for faster real-world systems. Traditional FPGA-based acceleration requ...

Full description

Saved in:

Bibliographic Details
Published in	VLSI Design and Test pp. 573 - 583
Main Authors	Singh, Rudresh Pratap, Kumar, Shreyam, Pandey, Jai Gopal
Format	Book Chapter
Language	English
Published	Cham Springer Nature Switzerland 2022
Series	Communications in Computer and Information Science
Subjects	Convolution neural networks (CNNs) FPGA-CPU Accelerators Hardware architectures Performance analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Convolutional neural networks (CNNs) have become widespread in the area of image recognition and are widely implemented in modern facial recognition systems. With the increasing use of CNNs, their run-time speed becomes critical for faster real-world systems. Traditional FPGA-based acceleration requires either large on-chip memory or high bandwidth and memory access time. We present an algorithm and subsequent hardware design for computing CNN using an overlap-and-add-based technique in the time domain. In the proposed algorithm, the input image is broken into tiles which can be processed independently without involving the overheads of computing in the frequency domain. This also allows efficient concurrency of the convolution process which results in higher throughput and lower power consumption. At the same time, we maintain the low on-chip memory requirements necessary for the fabrication of faster and cheaper processor designs. We implement CNN VGG16 and AlexNet models with our design on the Xilinx Virtex 7 and Zynq boards. Performance analysis of our design provides a 0.48×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} better throughput than the state-of-the-art AlexNet and uses 0.15×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} lesser multipliers and other resources than the state-of-the-art VGG16.
ISBN:	9783031215131 3031215133
ISSN:	1865-0929 1865-0937
DOI:	10.1007/978-3-031-21514-8_47