ADAPTIVE QUANTIZATION FOR EXECUTION OF MACHINE LEARNING MODELS

Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is...

Full description

Saved in:
Bibliographic Details
Main Authors CHATHA, Karamvir, GADELRAB, Serag, ROSENBERG, Ofer
Format Patent
LanguageEnglish
Published 09.09.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information.
Bibliography:Application Number: US202016810123