SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks

Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 43; no. 7; pp. 2163 - 2176
Main Authors	Chen, Yen-Wen, Wang, Rui-Hsuan, Cheng, Yu-Hsiang, Lu, Chih-Cheng, Chang, Meng-Fan, Tang, Kea-Tiong
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptive filters Algorithms Artificial intelligence Artificial neural networks Co-design Common Information Model (computing) Compression algorithms Computational modeling Computer architecture computing-in-memory (CIM) Convolutional neural networks deep learning Efficiency Hardware Machine learning Memory management quantization Quantization (signal) Random access memory Static random access memory Utilization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Convolutional neural networks (CNNs) play a key role in many deep learning applications; however, these networks are resource intensive. The parallel computing ability of computing-in-memory (CIM) enables high energy efficiency in artificial intelligence accelerators. When implementing a CNN in CIM, quantization and pruning are indispensable for reducing the calculation complexity and improving the efficiency of hardware calculations. Mixed-precision quantization with flexible bit widths provides a better efficiency-accuracy tradeoff than fixed-precision quantization. However, CIM calculations for mixed-precision models are inefficient because the fixed capacity of CIM macros is redundant for hybrid precision distributions. To address this, we propose a software and hardware co-design static random-access memory (SRAM)-based CIM architecture called SUN, including a CIM-adaptive mixed precision joint pruning quantization algorithm and dynamic hybrid precision CNN accelerator. Three techniques are implemented in this architecture: 1) a mixed precision joint pruning algorithm for reducing the memory access and removing the redundant computing; 2) a CIM-adaptive filter-wise and paired mixed-precision quantization for improving CIM macro utilization; and 3) an SRAM-based CIM CNN accelerator in which the SRAM CIM macro is used as the processing element to support sparse and mixed-precision CNN computation with high CIM macro utilization. This architecture achieves a system area efficiency of 428.2 TOPS/mm 2 and throughput of 792.2 GOPS on the CIFAR-10 dataset.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2024.3358583