An Overlap-and-Add Based Time Domain Acceleration of CNNs on FPGA-CPU Systems

Convolutional neural networks (CNNs) have become widespread in the area of image recognition and are widely implemented in modern facial recognition systems. With the increasing use of CNNs, their run-time speed becomes critical for faster real-world systems. Traditional FPGA-based acceleration requ...

Full description

Saved in:
Bibliographic Details
Published inVLSI Design and Test pp. 573 - 583
Main Authors Singh, Rudresh Pratap, Kumar, Shreyam, Pandey, Jai Gopal
Format Book Chapter
LanguageEnglish
Published Cham Springer Nature Switzerland 2022
SeriesCommunications in Computer and Information Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Convolutional neural networks (CNNs) have become widespread in the area of image recognition and are widely implemented in modern facial recognition systems. With the increasing use of CNNs, their run-time speed becomes critical for faster real-world systems. Traditional FPGA-based acceleration requires either large on-chip memory or high bandwidth and memory access time. We present an algorithm and subsequent hardware design for computing CNN using an overlap-and-add-based technique in the time domain. In the proposed algorithm, the input image is broken into tiles which can be processed independently without involving the overheads of computing in the frequency domain. This also allows efficient concurrency of the convolution process which results in higher throughput and lower power consumption. At the same time, we maintain the low on-chip memory requirements necessary for the fabrication of faster and cheaper processor designs. We implement CNN VGG16 and AlexNet models with our design on the Xilinx Virtex 7 and Zynq boards. Performance analysis of our design provides a 0.48×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} better throughput than the state-of-the-art AlexNet and uses 0.15×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} lesser multipliers and other resources than the state-of-the-art VGG16.
ISBN:9783031215131
3031215133
ISSN:1865-0929
1865-0937
DOI:10.1007/978-3-031-21514-8_47