A Heuristic and Greedy Weight Remapping Scheme with Hardware Optimization for Irregular Sparse Neural Networks Implemented on CIM Accelerator in Edge AI Applications

Computing-in-memory (CIM) is a promising technique for hardware acceleration of neural networks (NNs) with high performance and efficiency. However, conventional dense mapping scheme cannot well support the compression and optimization of irregular sparse NNs. In this paper, we propose a heuristic a...

Full description

Saved in:

Bibliographic Details
Published in	2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) pp. 551 - 556
Main Authors	Wu, Lizhou, Zhao, Chenyang, Wang, Jingbo, Yu, Xueru, Chen, Shoumian, Li, Chen, Han, Jun, Xue, Xiaoyong, Zeng, Xiaoyang
Format	Conference Proceeding
Language	English
Published	IEEE 22.01.2024
Subjects	Artificial neural networks computing-in-memory Design automation Energy efficiency genetic algorithm Greedy algorithms irregular sparsity Pipelines Planarization Runtime weight remapping zero-skipping
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computing-in-memory (CIM) is a promising technique for hardware acceleration of neural networks (NNs) with high performance and efficiency. However, conventional dense mapping scheme cannot well support the compression and optimization of irregular sparse NNs. In this paper, we propose a heuristic and greedy weight remapping scheme for irregular sparse neural networks implemented on CIM accelerator in edge AI applications. The genetic algorithm (GA) is proposed for the first time to be utilized in the column shuffle for sparse weight remapping. Combined with the granularity exploration of the CIM, the proportion of the compressible all-zero rows increase remarkably. A greedy algorithm is then employed to planarize the unevenly compressed units, thus to improve the storage utilization of the crossbar. For hardware optimization, the pipeline is customized with a zero-skipping circuit to leverage the bit-level activation sparsity at runtime. Our results show that the proposed remapping scheme achieves 70%-94% utilization rate of the sparsity, and an average of 1.3 \times increment compared with the naive compression. The cooptimized CIM achieves 3-7.6 \times speedup and 2.1- 4.8 \times energy efficiency, compared with the baseline for dense NNs.
ISSN:	2153-697X
DOI:	10.1109/ASP-DAC58780.2024.10473919