CURSOR-BASED ADAPTIVE QUANTIZATION FOR DEEP NEURAL NETWORKS

Deep neural networks (DNN) model quantization may be used to reduce storage and computation burdens by decreasing the bit width. Presented herein are novel cursor-based adaptive quantization embodiments. In embodiments, a multiple bits quantization mechanism is formulated as a differentiable archite...

Full description

Saved in:

Bibliographic Details
Main Authors	CHENG, Zhiyu, LI, Baopu, FAN, Yanwen, BAO, Yingze
Format	Patent
Language	English
Published	29.07.2021
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep neural networks (DNN) model quantization may be used to reduce storage and computation burdens by decreasing the bit width. Presented herein are novel cursor-based adaptive quantization embodiments. In embodiments, a multiple bits quantization mechanism is formulated as a differentiable architecture search (DAS) process with a continuous cursor that represents a possible quantization bit. In embodiments, the cursor-based DAS adaptively searches for a quantization bit for each layer. The DAS process may be accelerated via an alternative approximate optimization process, which is designed for mixed quantization scheme of a DNN model. In embodiments, a new loss function is used in the search process to simultaneously optimize accuracy and parameter size of the model. In a quantization step, the closest two integers to the cursor may be adopted as the bits to quantize the DNN together to reduce the quantization noise and avoid the local convergence problem.
Bibliography:	Application Number: US201916966834