High PE Utilization CNN Accelerator with Channel Fusion Supporting Pattern-Compressed Sparse Neural Networks

Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of spar...

Full description

Saved in:
Bibliographic Details
Published in2020 57th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors Wang, Jingyu, Yu, Songming, Yue, Jinshan, Yuan, Zhe, Yuan, Zhuqing, Yang, Huazhong, Li, Xueqing, Liu, Yongpan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recently CNN-based methods have made remarkable progress in broad fields. Both network pruning algorithms and hardware accelerators have been introduced to accelerate CNN. However, existing pruning algorithms have not fully studied the pattern pruning method, and current index storage scheme of sparse CNN is not efficient. Furthermore, the performance of existing accelerators suffers from no-load PEs on sparse networks. This work proposes a software-hardware co-design to address these problems. The software includes an ADMM-based method which compresses the patterns of convolution kernels with acceptable accuracy loss, and a Huffman encoding method which reduces index storage overhead. The hardware is a fusion-enabled systolic architecture, which can reduce PEs' no-load rate and improve performance by supporting the channel fusion. On CIFAR-10, this work achieves 5.63x index storage reduction with 2-7 patterns among different layers with 0.87% top-1 accuracy loss. Compared with the state-of-art accelerator, this work achieves 1.54x-1.79x performance and 25%-34% reduction of no-load rate with reasonable area and power overheads.
DOI:10.1109/DAC18072.2020.9218630