A novel pipeline framework for multi oriented scene text image detection and recognition

•A unified framework is proposed that takes scene text detection and recognition.•New.i.ReLU layer is introduced which can detect text components (even vertical).•New.i.inception layer is introduced which can obtain broadly varying-sized text.•A novel algorithm is introduced for feature extraction (...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 170; p. 114549
Main Authors	Naiemi, Fatemeh, Ghods, Vahid, Khalesi, Hassan
Format	Journal Article
Language	English
Published	New York Elsevier Ltd 15.05.2021 Elsevier BV
Subjects	Algorithms Artificial neural networks Autonomous cars Character recognition Convolution Convolutional neural network (CNN) End to end recognition Feature extraction Image detection Multi oriented Object recognition Pipelines Scene text localization Text image detection Text recognition Text recognition Text image detection Multi oriented Convolutional neural network (CNN) Scene text localization End to end recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•A unified framework is proposed that takes scene text detection and recognition.•New.i.ReLU layer is introduced which can detect text components (even vertical).•New.i.inception layer is introduced which can obtain broadly varying-sized text.•A novel algorithm is introduced for feature extraction (LWDP). Automatic text detection and recognition (end-to-end text recognition) in real-life images are the main elements of many applications including blind and low vision assistance systems and self-driving cars. However, it is challenging to detect curved and vertical texts due to their color bleeding, font size variation, and complicated background. In this paper, a convolutional neural network-based pipeline is introduced to obtain high-level visual features and improve text detection and recognition efficiency. A pre-trained ResNet-50 network on ImageNet and SynthText for extracting low-level visual features was used in this study. Moreover, new improved ReLU layer (new.i.ReLU) blocks are used with a varied receptive field with a strong ability to detect text components even on curved surfaces in the proposed structure. A new improved inception layer (new.i.inception layers) can obtain broadly varying-sized text more effectively than a linear chain of convolution layer. Also, we have proposed a pipeline framework for character recognition that is robust to irregular (curve and vertical) text. First, we introduced a novel algorithm for encoding pixel’s value to a new one called local word directional pattern (LWDP) that highlights the texture of the characters. Then, the output of LWDP was presented as an input image in the text recognition process. The experiments on standard benchmarks, including ICDAR 2013, ICDAR 2015, and ICDAR 2019 datasets, illustrated the superiority of the proposed architecture over prior works.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2020.114549