Research on quantitative inference acceleration technology of Convolutional Neural Network for ARM Platform

With the rapid development of the Internet of Things, the advantages of edge computing such as low latency, high availability and high real-time are constantly highlighted. Applications with high computing consumption like convolutional neural network are also constantly deployed on mobile edge.Howe...

Full description

Saved in:

Bibliographic Details
Published in	2022 16th IEEE International Conference on Signal Processing (ICSP) Vol. 1; pp. 208 - 211
Main Authors	Wang, Xuqiang, Zhang, Qianyi, Yang, Yifan, Zong, Xiangrui
Format	Conference Proceeding
Language	English
Published	IEEE 21.10.2022
Subjects	Computational modeling Convolution Convolution layer convolutional neural network Edge calculation Hardware Heuristic algorithms Real-time systems Reduced instruction set computing Signal processing algorithms The model of quantitative
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the rapid development of the Internet of Things, the advantages of edge computing such as low latency, high availability and high real-time are constantly highlighted. Applications with high computing consumption like convolutional neural network are also constantly deployed on mobile edge.However, the deployment of convolutional neural networks on mobile terminals is limited due to the high computing density, high parallelism, many floating point operations and relatively limited computing resources on mobile terminals.This article in order to realize convolutional neural network optimization and its deployment, on the mobile end for convolution the parameters of the neural network to load the file, choose after dynamic network weight cutting method cutting out the redundant weight and quantification model of network weights fixed point, design convolution kernels offline coding scheme, in order to reduce the floating-point computation and storage consumption in the process of convolution operation.The convolution layer algorithm is designed to process the input and output and complete the convolution operation of the input and the convolution kernel. SIMD instruction design provided by ARMCPU is used to optimize the convolution layer from the bottom operation, that is, NEON instruction set is used to share as much computation as possible in the convolution calculation module to reduce the number of convolution cycles.Give full play to the performance of SIMD instructions.Experiments confirm the acceleration effect of this optimization.
ISSN:	2164-5221
DOI:	10.1109/ICSP56322.2022.9964483