An Overlap-and-Add Based Time Domain Acceleration of CNNs on FPGA-CPU Systems
Convolutional neural networks (CNNs) have become widespread in the area of image recognition and are widely implemented in modern facial recognition systems. With the increasing use of CNNs, their run-time speed becomes critical for faster real-world systems. Traditional FPGA-based acceleration requ...
Saved in:
Published in | VLSI Design and Test pp. 573 - 583 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer Nature Switzerland
2022
|
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Convolutional neural networks (CNNs) have become widespread in the area of image recognition and are widely implemented in modern facial recognition systems. With the increasing use of CNNs, their run-time speed becomes critical for faster real-world systems. Traditional FPGA-based acceleration requires either large on-chip memory or high bandwidth and memory access time. We present an algorithm and subsequent hardware design for computing CNN using an overlap-and-add-based technique in the time domain. In the proposed algorithm, the input image is broken into tiles which can be processed independently without involving the overheads of computing in the frequency domain. This also allows efficient concurrency of the convolution process which results in higher throughput and lower power consumption. At the same time, we maintain the low on-chip memory requirements necessary for the fabrication of faster and cheaper processor designs. We implement CNN VGG16 and AlexNet models with our design on the Xilinx Virtex 7 and Zynq boards. Performance analysis of our design provides a 0.48×\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\times $$\end{document} better throughput than the state-of-the-art AlexNet and uses 0.15×\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\times $$\end{document} lesser multipliers and other resources than the state-of-the-art VGG16. |
---|---|
ISBN: | 9783031215131 3031215133 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-3-031-21514-8_47 |