A selective quantization approach for optimizing quantized inference engine
Deep learning based algorithms have achieved excellent performance in many tasks of computer vision and pattern recognition. However, it is still challenging to deploy the trained network onto resource-limited platforms. Before deploying a well-trained network onto embedded platforms, a popular proc...
Saved in:
Published in | 2023 11th International Conference on Information Systems and Computing Technology (ISCTech) pp. 92 - 99 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.07.2023
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ISCTech60480.2023.00024 |
Cover
Summary: | Deep learning based algorithms have achieved excellent performance in many tasks of computer vision and pattern recognition. However, it is still challenging to deploy the trained network onto resource-limited platforms. Before deploying a well-trained network onto embedded platforms, a popular processing is to quantize the model and then optimize the quantized model with TensorRT. However, the performance of the inference engine, the output of TensorRT, is still expected to be improved. This paper quantizes a well-trained network for expression recognition and investigates the relationship between the position of quantization and the performance of inference engine when using the technique of quantization aware training. We propose a selective quantized approach based on the number of parameters to improve the performance of quantized inference engine. Experiments on various quantization schemes and quantization thresholds demonstrate that compared to directly quantize the engine, the strategy we proposed can further accelerate the engine while maintaining the recognition accuracy and compress the model as much as possible. |
---|---|
DOI: | 10.1109/ISCTech60480.2023.00024 |