A novel pipeline framework for multi oriented scene text image detection and recognition

•A unified framework is proposed that takes scene text detection and recognition.•New.i.ReLU layer is introduced which can detect text components (even vertical).•New.i.inception layer is introduced which can obtain broadly varying-sized text.•A novel algorithm is introduced for feature extraction (...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 170; p. 114549
Main Authors Naiemi, Fatemeh, Ghods, Vahid, Khalesi, Hassan
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 15.05.2021
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A unified framework is proposed that takes scene text detection and recognition.•New.i.ReLU layer is introduced which can detect text components (even vertical).•New.i.inception layer is introduced which can obtain broadly varying-sized text.•A novel algorithm is introduced for feature extraction (LWDP). Automatic text detection and recognition (end-to-end text recognition) in real-life images are the main elements of many applications including blind and low vision assistance systems and self-driving cars. However, it is challenging to detect curved and vertical texts due to their color bleeding, font size variation, and complicated background. In this paper, a convolutional neural network-based pipeline is introduced to obtain high-level visual features and improve text detection and recognition efficiency. A pre-trained ResNet-50 network on ImageNet and SynthText for extracting low-level visual features was used in this study. Moreover, new improved ReLU layer (new.i.ReLU) blocks are used with a varied receptive field with a strong ability to detect text components even on curved surfaces in the proposed structure. A new improved inception layer (new.i.inception layers) can obtain broadly varying-sized text more effectively than a linear chain of convolution layer. Also, we have proposed a pipeline framework for character recognition that is robust to irregular (curve and vertical) text. First, we introduced a novel algorithm for encoding pixel’s value to a new one called local word directional pattern (LWDP) that highlights the texture of the characters. Then, the output of LWDP was presented as an input image in the text recognition process. The experiments on standard benchmarks, including ICDAR 2013, ICDAR 2015, and ICDAR 2019 datasets, illustrated the superiority of the proposed architecture over prior works.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2020.114549