Compressing Convolutional Neural Networks by L0 Regularization
Convolutional Neural Networks have recently taken over the field of image processing, because they can handle complex non algorithmic problems with state-of-the-art results, based on precision and inference times. However, there are many environments (e.g. cell phones, IoT, embedded systems, etc.) a...
Saved in:
Published in | 2019 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO) pp. 155 - 162 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Convolutional Neural Networks have recently taken over the field of image processing, because they can handle complex non algorithmic problems with state-of-the-art results, based on precision and inference times. However, there are many environments (e.g. cell phones, IoT, embedded systems, etc.) and use-cases (e.g. pedestrian detection in autonomous driving assistant systems), where the hard real-time requirements can only be satisfied by efficient computational resource utilization. The general trend is training larger and more complex networks in order to achieve better accuracies and forcing these networks to be redundant (in order to increase their generalization ability). However, this produces networks that cannot be used in such scenarios. Pruning methods try to solve this problem by reducing the size of the trained neural networks. These methods eliminate the redundant computations after the training, which usually cause high drop in the accuracy. In this paper, we propose new regularization techniques, which induce the sparsity of the parameters during the training and in this way, the network can be efficiently pruned. From this viewpoint, we analyse and compare the effect of minimizing different norms of the weights (L1, L0) one by one and for groups of them (for kernels and channels). L1 regularization can be optimized by Gradient Descent, but this is not true for L0. The paper proposes a combination of Proximal Gradient Descent optimization and RMSProp method to solve the resulting optimization problem. Our results demonstrate that the proposed L0 minimization-based regularization methods outperform the L1 based ones, both in terms of sparsity of the resulting weight-matrices and the accuracy of the pruned network. Additionally, we demonstrate that the accuracy of deep neural networks can also be increased using the proposed sparsifying regularizations. |
---|---|
DOI: | 10.1109/ICCAIRO47923.2019.00032 |