SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity
To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acc...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
30.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To address the challenge of increasing network size, researchers have
developed sparse models through network pruning. However, maintaining model
accuracy while achieving significant speedups on general computing devices
remains an open problem. In this paper, we present a novel mobile inference
acceleration framework SparseByteNN, which leverages fine-grained kernel
sparsity to achieve real-time execution as well as high accuracy. Our framework
consists of two parts: (a) A fine-grained kernel sparsity schema with a
sparsity granularity between structured pruning and unstructured pruning. It
designs multiple sparse patterns for different operators. Combined with our
proposed whole network rearrangement strategy, the schema achieves a high
compression rate and high precision at the same time. (b) Inference engine
co-optimized with the sparse pattern. The conventional wisdom is that this
reduction in theoretical FLOPs does not translate into real-world efficiency
gains. We aim to correct this misconception by introducing a family of
efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient
implementation of sparse primitives, we show that sparse versions of
MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy
curve. Experimental results on Qualcomm 855 show that for 30% sparse
MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and
1.29x speedup over the state-of-the-art sparse inference engine MNN with a
slight accuracy drop of 0.224%. The source code of SparseByteNN will be
available at https://github.com/lswzjuer/SparseByteNN |
---|---|
DOI: | 10.48550/arxiv.2310.19509 |