A Stride-Based Convolution Decomposition Method to Stretch CNN Acceleration Algorithms for Efficient and Flexible Hardware Implementation

To reduce multiplication operations in convolution of convolutional neural networks (CNNs), there are three widely used convolutional acceleration algorithms, i.e., Winograd, FFT and FFA. However, current accelerators based on these convolutional acceleration algorithms have issues on flexibility an...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems. I, Regular papers Vol. 67; no. 9; pp. 3007 - 3020
Main Authors	Yang, Chen, Wang, Yizhou, Wang, Xiaoli, Geng, Li
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acceleration acceleration algorithm Accelerators Algorithms Artificial neural networks Computational modeling Convolution convolution decomposition Convolutional codes Convolutional neural networks Decomposition Flexibility Hardware hardware-efficient Kernel Kernels Multiplication Shape
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To reduce multiplication operations in convolution of convolutional neural networks (CNNs), there are three widely used convolutional acceleration algorithms, i.e., Winograd, FFT and FFA. However, current accelerators based on these convolutional acceleration algorithms have issues on flexibility and efficiency. Firstly, some accelerators utilized a combination of these acceleration algorithms and employed multiple types of computational units to achieve their respective advantages. As a result, some computational units are left unused when the best-performing unit is working, which causes much area inefficiency. Secondly, current accelerators tend to choose small parameters of these convolutional acceleration algorithms to avoid unacceptable precision loss, as a result, they are hardly to support large kernel sizes and lack of flexibility. Thirdly, these acceleration algorithms are typically presented for 1-stride convolutions, consequently, few implementation considers the acceleration of large-stride convolutions, which is a major restriction to hardware flexibility. This paper proposed a stride-based convolution decomposition method (SCDM) to reform different convolution shapes (i.e., kernel sizes & strides) to an identical pattern. With the aid of SCDM, a Winograd-stretched and hardware-efficient design (WHD) is presented to utilize one uniform computational unit for the acceleration of different convolution shapes, which combines complementary performance advantages on both Winograd F(4,3) and F(4,2) units. Compared to current FFT-based or FFA-based works, WHD can stretch the use range of Winograd and simplify implementation, thereby achieving hardware flexibility and efficiency. Evaluation results show that 34.08%~55.41% operation reduction were achieved on six CNN models, while incurring a slight hardware overhead.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2020.2985727