Design and Implementation of a Universal Shift Convolutional Neural Network Accelerator

Currently, many applications implement convolutional neural networks (CNNs) on CPUs or GPUs while the performances are limited by the computational complexity of these networks. Compared with the implementation on CPUs or GPUs, deploying convolutional neural accelerators on FPGAs can achieve superio...

Full description

Saved in:
Bibliographic Details
Published inIEEE embedded systems letters Vol. 16; no. 1; pp. 17 - 20
Main Authors Song, Qingzeng, Cui, Weizhi, Sun, Liankun, Jin, Guanghao
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.03.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Currently, many applications implement convolutional neural networks (CNNs) on CPUs or GPUs while the performances are limited by the computational complexity of these networks. Compared with the implementation on CPUs or GPUs, deploying convolutional neural accelerators on FPGAs can achieve superior performance. On the other side, the multiplication operations of CNNs have been a constraint for FPGAs to achieve the better performance. In this letter, we proposed a shift CNN accelerator, which converts the multiplication operations into shift operations. Based on the shift operation, our accelerator can break the computational bottleneck of FPGAs. On Virtex UltraScale+ VU9P, our accelerator can save DSP resources and reduce memory consumption while achieving a performance of 1.18 Tera Operations Per Second (TOPS), which is an essential improvement over the other convolutional neural accelerators.
ISSN:1943-0663
1943-0671
DOI:10.1109/LES.2022.3233796