A holistic representation guided attention network for scene text recognition

Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-trai...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 414; pp. 67 - 75
Main Authors Yang, Lu, Wang, Peng, Li, Hui, Li, Zhen, Zhang, Yanning
Format Journal Article
LanguageEnglish
Published Elsevier B.V 13.11.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2020.07.010