Research on quantitative inference acceleration technology of Convolutional Neural Network for ARM Platform
With the rapid development of the Internet of Things, the advantages of edge computing such as low latency, high availability and high real-time are constantly highlighted. Applications with high computing consumption like convolutional neural network are also constantly deployed on mobile edge.Howe...
Saved in:
Published in | 2022 16th IEEE International Conference on Signal Processing (ICSP) Vol. 1; pp. 208 - 211 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
21.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With the rapid development of the Internet of Things, the advantages of edge computing such as low latency, high availability and high real-time are constantly highlighted. Applications with high computing consumption like convolutional neural network are also constantly deployed on mobile edge.However, the deployment of convolutional neural networks on mobile terminals is limited due to the high computing density, high parallelism, many floating point operations and relatively limited computing resources on mobile terminals.This article in order to realize convolutional neural network optimization and its deployment, on the mobile end for convolution the parameters of the neural network to load the file, choose after dynamic network weight cutting method cutting out the redundant weight and quantification model of network weights fixed point, design convolution kernels offline coding scheme, in order to reduce the floating-point computation and storage consumption in the process of convolution operation.The convolution layer algorithm is designed to process the input and output and complete the convolution operation of the input and the convolution kernel. SIMD instruction design provided by ARMCPU is used to optimize the convolution layer from the bottom operation, that is, NEON instruction set is used to share as much computation as possible in the convolution calculation module to reduce the number of convolution cycles.Give full play to the performance of SIMD instructions.Experiments confirm the acceleration effect of this optimization. |
---|---|
ISSN: | 2164-5221 |
DOI: | 10.1109/ICSP56322.2022.9964483 |