A holistic representation guided attention network for scene text recognition

Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-trai...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 414; pp. 67 - 75
Main Authors	Yang, Lu, Wang, Peng, Li, Hui, Li, Zhen, Zhang, Yanning
Format	Journal Article
Language	English
Published	Elsevier B.V 13.11.2020
Subjects	Convolutional-Attention Holistic Representation Scene Text Recognition Transformer Transformer Holistic Representation Scene Text Recognition Convolutional-Attention
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2020.07.010