A Hardware-Centric Approach to Increase and Prune Regular Activation Sparsity in CNNs

A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise spar...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) pp. 1 - 5
Main Authors Hotfilter, Tim, Hoefer, Julian, Kres, Fabian, Kempf, Fabian, Kraft, Leonhard, Harbaum, Tanja, Becker, Jurgen
Format Conference Proceeding
LanguageEnglish
Published IEEE 11.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.
ISSN:2834-9857
DOI:10.1109/AICAS57966.2023.10168566