ES-MPQ: Evolutionary Search Enabled Mixed Precision Quantization Framework for Computing-in-Memory

Network quantization can effectively reduce the complexity without changing the network structures, which is conducive to deploying deep neural networks (DNN) on edge devices. However, most of the existing methods set the quantization precision manually and rarely consider the case that the computin...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE 12th Non-Volatile Memory Systems and Applications Symposium (NVMSA) pp. 38 - 43
Main Authors	Sun, Sifan, Ge, Jinming, Bai, Jinyu, Kang, Wang
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2023
Subjects	Artificial neural networks Common Information Model (computing) Complexity theory computing-in-memory Energy consumption evolutionary search Hardware Mixed precision quantization Nonvolatile memory Quantization (signal)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Network quantization can effectively reduce the complexity without changing the network structures, which is conducive to deploying deep neural networks (DNN) on edge devices. However, most of the existing methods set the quantization precision manually and rarely consider the case that the computing array is limited, such as computing-in-memory (CIM). In this paper, we introduce a novel method named ES-MPQ, which employs evolutionary search to achieve mixed precision quantization with a small calibration dataset. The ES-MPQ can optimize multiple objectives to achieve better hardware efficiency. The experimental results for ResNet-18 on CIFAR-10 show that the proposed ES-MPQ can reduce the parameter size and energy consumption by up to 1.89x and 2.81x, respectively, compared with the fixed bit-width (8 bits) quantization, while losing only 0.59% accuracy.
ISSN:	2575-257X
DOI:	10.1109/NVMSA58981.2023.00018