Compressed Vision Transformer for Scene Text Recognition

With the advancement of scene text recognition and deep learning, an increasing number of models have been proposed and applied to scene text recognition tasks. However, deploying these powerful yet computationally intensive models on resource-constrained devices is challenging. Model pruning is one...

Full description

Saved in:

Bibliographic Details
Published in	2024 7th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI) pp. 01 - 05
Main Authors	Ren, Jinbiao, Deng, Tao, Huang, Yanlin, Qu, Da, Su, Jianqiu, Li, Bingen
Format	Conference Proceeding
Language	English
Published	IEEE 20.12.2024
Subjects	Accuracy Artificial intelligence Compression Computational efficiency Computational modeling Computer vision Costs Deep learning Load modeling Pruning Scene Text Recongnition Text recognition Transformers ViT
Online Access	Get full text
DOI	10.1109/ACAI63924.2024.10899477

Cover

More Information
Summary:	With the advancement of scene text recognition and deep learning, an increasing number of models have been proposed and applied to scene text recognition tasks. However, deploying these powerful yet computationally intensive models on resource-constrained devices is challenging. Model pruning is one of the most effective methods for compressing and accelerating these models, as it reduces the number of parameters and computational load by removing less critical parameters or structures. In ViT models, each parameter influences its neighboring parameters locally. Therefore, rather than pruning solely based on parameter magnitude, we propose selecting parameters for removal based on their local influence. By calculating the combined impact of each parameter along with its neighbors, we identify and prune those with minimal overall influence on the model, achieving compression and acceleration without significantly compromising accuracy. Our pruning method substantially reduces parameter count and computational cost while preserving accuracy, as demonstrated across seven test datasets and in comparison with more than five similar STR algorithms.
DOI:	10.1109/ACAI63924.2024.10899477