A selective quantization approach for optimizing quantized inference engine

Deep learning based algorithms have achieved excellent performance in many tasks of computer vision and pattern recognition. However, it is still challenging to deploy the trained network onto resource-limited platforms. Before deploying a well-trained network onto embedded platforms, a popular proc...

Full description

Saved in:

Bibliographic Details
Published in	2023 11th International Conference on Information Systems and Computing Technology (ISCTech) pp. 92 - 99
Main Authors	Liu, Chang, Zhang, Dong
Format	Conference Proceeding
Language	English
Published	IEEE 30.07.2023
Subjects	Computational modeling Engines Pattern recognition quantization Quantization (signal) quantization aware training Task analysis TensorRT Throughput Training
Online Access	Get full text
DOI	10.1109/ISCTech60480.2023.00024

Cover

More Information
Summary:	Deep learning based algorithms have achieved excellent performance in many tasks of computer vision and pattern recognition. However, it is still challenging to deploy the trained network onto resource-limited platforms. Before deploying a well-trained network onto embedded platforms, a popular processing is to quantize the model and then optimize the quantized model with TensorRT. However, the performance of the inference engine, the output of TensorRT, is still expected to be improved. This paper quantizes a well-trained network for expression recognition and investigates the relationship between the position of quantization and the performance of inference engine when using the technique of quantization aware training. We propose a selective quantized approach based on the number of parameters to improve the performance of quantized inference engine. Experiments on various quantization schemes and quantization thresholds demonstrate that compared to directly quantize the engine, the strategy we proposed can further accelerate the engine while maintaining the recognition accuracy and compress the model as much as possible.
DOI:	10.1109/ISCTech60480.2023.00024