Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA
The performance of recent CNN accelerators falls behind their peak performance because they fail to maximize parallel computation in every convolutional layer from the parallelism that varies throughout the CNN. Furthermore, the exploitation of multiple parallelisms may reduce calculation-skip abili...
Saved in:
Published in | IPSJ Transactions on System LSI Design Methodology Vol. 12; pp. 22 - 37 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Tokyo
Information Processing Society of Japan
01.01.2019
Japan Science and Technology Agency |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The performance of recent CNN accelerators falls behind their peak performance because they fail to maximize parallel computation in every convolutional layer from the parallelism that varies throughout the CNN. Furthermore, the exploitation of multiple parallelisms may reduce calculation-skip ability. This paper proposes a convolution core for sparse CNN that leverages multiple types of parallelism and weight sparsity efficiently to achieve high performance. It alternates dataflow and scheduling of parallel computation according to the available parallelism of each convolutional layer by exploiting both intra- and inter-output parallelism to maximize multiplier utilization. In addition, it eliminates redundant multiply-accumulate (MACC) operations due to weight sparsity. The proposed convolution core enables both abilities with ease of dataflow control by using a parallelism controller for scheduling parallel MACCs on the processing elements (PEs) and a weight broadcaster for broadcasting non-zero weights to the PEs according to the scheduling. The proposed convolution core was evaluated on 13 convolutional layers in a sparse VGG-16 benchmark. It outperforms the baseline architecture for dense CNN that exploits intra-output parallelism by 4x speedup. It achieves 3x effective GMACS over prior arts of CNN accelerator in total performance. |
---|---|
ISSN: | 1882-6687 1882-6687 |
DOI: | 10.2197/ipsjtsldm.12.22 |