Optimization of Direct Convolution Algorithms on ARM Processors for Deep Learning Inference

In deep learning, convolutional layers typically bear the majority of the computational workload and are often the primary contributors to performance bottlenecks. The widely used convolution algorithm is based on the IM2COL transform to take advantage of the highly optimized GEMM (General Matrix Mu...

Full description

Saved in:

Bibliographic Details
Published in	Mathematics (Basel) Vol. 13; no. 5; p. 787
Main Authors	Li, Shang, Yu, Fei, Zhang, Shankou, Yin, Huige, Lin, Hairong
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.03.2025
Subjects	Algebra Algorithms ARMv8 architecture Artificial intelligence Convolution Deep learning direct algorithm Efficiency Inference Innovations Libraries Linear algebra Mathematical optimization Multiplication Network management systems Neural networks Optimization Optimization techniques Performance evaluation performance optimization Personal computers
Online Access	Get full text
ISSN	2227-7390 2227-7390
DOI	10.3390/math13050787

Cover

More Information
Summary:	In deep learning, convolutional layers typically bear the majority of the computational workload and are often the primary contributors to performance bottlenecks. The widely used convolution algorithm is based on the IM2COL transform to take advantage of the highly optimized GEMM (General Matrix Multiplication) kernel acceleration, using the highly optimized BLAS (Basic Linear Algebra Subroutine) library, which tends to incur additional memory overhead. Recent studies have indicated that direct convolution approaches can outperform traditional convolution implementations without additional memory overhead. In this paper, we propose a high-performance implementation of the direct convolution algorithm for inference that preserves the channel-first data layout of the convolutional layer inputs/outputs. We evaluate the performance of our proposed algorithm on a multi-core ARM CPU platform and compare it with state-of-the-art convolution optimization techniques. Experimental results demonstrate that our new algorithm performs better across the evaluated scenarios and platforms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2227-7390 2227-7390
DOI:	10.3390/math13050787