High PE Utilization CNN Accelerator with Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks

Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of spar...

Full description

Saved in:

Bibliographic Details
Published in	2020 57th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors	Wang, Jingyu, Yu, Songming, Yue, Jinshan, Yuan, Zhe, Yuan, Zhuqing, Yang, Huazhong, Li, Xueqing, Liu, Yongpan
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2020
Subjects	channel fusion Computer architecture Convolution Hardware Indexes Kernel Neural networks no-load rate reduction pattern compression pruning algorithms Silicon sparse CNN accelerators
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of sparse CNN is not efficient. Furthermore, the performance of existing accelerators suffers from no-load PEs on sparse networks. This work proposes a software-hardware co-design to address these problems. The software includes an ADMM-based method which compresses the patterns of convolution kernels with acceptable accuracy loss, and a Huffman encoding method which reduces index storage overhead. The hardware is a fusion-enabled systolic architecture, which can reduce PEs' no-load rate and improve performance by supporting the channel fusion. On CIFAR-10, this work achieves 5.63x index storage reduction with 2-7 patterns among different layers with 0.87% top-1 accuracy loss. Compared with the state-of-art accelerator, this work achieves 1.54x-1.79x performance and 25%-34% reduction of no-load rate with reasonable area and power overheads.
DOI:	10.1109/DAC18072.2020.9218630