ADAPTIVE QUANTIZATION FOR EXECUTION OF MACHINE LEARNING MODELS
Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
09.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information. |
---|---|
Bibliography: | Application Number: US202016810123 |