Deep Neural Network Acceleration With Sparse Prediction Layers

The ever-increasing computation cost of Convolutional Neural Network (CNN) makes it imperative for real-world applications to accelerate the key steps especially the inference. In this work, we propose an efficient yet general scheme called Sparse Prediction Layer (SPL) which can predict and skip th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 8; pp. 6839 - 6848
Main Authors	Yao, Zhongtian, Huang, Kejie, Shen, Haibin, Ming, Zhaoyan
Format	Journal Article
Language	English
Published	Piscataway IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acceleration Artificial intelligence Artificial neural networks Convolution Floating point arithmetic high performance computing Kernel Matrices (mathematics) Multiplication Neural networks Neurons Pruning Redundancy Retraining Sparse matrices
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The ever-increasing computation cost of Convolutional Neural Network (CNN) makes it imperative for real-world applications to accelerate the key steps especially the inference. In this work, we propose an efficient yet general scheme called Sparse Prediction Layer (SPL) which can predict and skip the trivial elements in the CNN layer. Pruned weights are used to predict the locations of maximum values in max-pooling kernels and those of positive values before Rectified Linear Units (ReLUs). Thereafter, the precise values of these predicted important elements are calculated selectively and the complete outputs are restored from them. Our experiments on ImageNet Large Scale Visual Recognition Competition (ILSVRC) 2012 show that SPL can reduce 68.3%, 58.6% and 59.5% Floating-point Operations (FLOPs) on AlexNet, VGG-16 and ResNet-50, respectively, within an accuracy loss of less than 1% without retraining. The proposed SPL scheme can further accelerate these networks pruned by other pruning-based methods, such as a FLOP reduction of 50.2% on the ResNet-50 which has been pruned by Channel Pruning (CP) before being applied with SPLs. A special matrix multiplication called Sparse Result Matrix Multiplication (SRMM) is proposed to support the implementation of SPL, and its acceleration effect is in line with expectations.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2020.2963941