A selective quantization approach for optimizing quantized inference engine

Deep learning based algorithms have achieved excellent performance in many tasks of computer vision and pattern recognition. However, it is still challenging to deploy the trained network onto resource-limited platforms. Before deploying a well-trained network onto embedded platforms, a popular proc...

Full description

Saved in:
Bibliographic Details
Published in2023 11th International Conference on Information Systems and Computing Technology (ISCTech) pp. 92 - 99
Main Authors Liu, Chang, Zhang, Dong
Format Conference Proceeding
LanguageEnglish
Published IEEE 30.07.2023
Subjects
Online AccessGet full text
DOI10.1109/ISCTech60480.2023.00024

Cover

More Information
Summary:Deep learning based algorithms have achieved excellent performance in many tasks of computer vision and pattern recognition. However, it is still challenging to deploy the trained network onto resource-limited platforms. Before deploying a well-trained network onto embedded platforms, a popular processing is to quantize the model and then optimize the quantized model with TensorRT. However, the performance of the inference engine, the output of TensorRT, is still expected to be improved. This paper quantizes a well-trained network for expression recognition and investigates the relationship between the position of quantization and the performance of inference engine when using the technique of quantization aware training. We propose a selective quantized approach based on the number of parameters to improve the performance of quantized inference engine. Experiments on various quantization schemes and quantization thresholds demonstrate that compared to directly quantize the engine, the strategy we proposed can further accelerate the engine while maintaining the recognition accuracy and compress the model as much as possible.
DOI:10.1109/ISCTech60480.2023.00024