DEFUSE: Deep Fused End-to-End Video Text Detection and Recognition

Detecting and recognizing text in natural scene videos and images has brought more attention to computer vision researchers due to applications like robotic navigation and traffic sign detection. In addition, Optical Character Recognition (OCR) technology is applied to detect and recognize text on t...

Full description

Saved in:
Bibliographic Details
Published inRevue d'Intelligence Artificielle Vol. 36; no. 3; p. 459
Main Authors Chaitra Yuvaraj Lokkondra, Ramegowda, Dinesh, Thimmaiah, Gopalakrishna Madigondanahalli, Ajay Prakash Bassappa Vijaya
Format Journal Article
LanguageEnglish
Published Edmonton International Information and Engineering Technology Association (IIETA) 01.06.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Detecting and recognizing text in natural scene videos and images has brought more attention to computer vision researchers due to applications like robotic navigation and traffic sign detection. In addition, Optical Character Recognition (OCR) technology is applied to detect and recognize text on the license plate. It will be used in various commercial applications such as finding stolen cars, calculating parking fees, invoicing tolls, or controlling access to safety zones and aids in detecting fraud and secure data transactions in the banking industry. Much effort is required when scene text videos are in low contrast and motion blur with arbitrary orientations. Presently, text detection and recognition approaches are limited to static images like horizontal or approximately horizontal text. Detecting and recognizing text in videos with data dynamicity is more challenging because of the presence of multiple blurs caused by defocusing, motion, illumination changes, arbitrarily shaped, and occlusion. Thus, we proposed a combined DeepEAST (Deep Efficient and Accurate Scene Text Detector) and Keras OCR model to overcome these challenges in the proffered DEFUSE (Deep Fused) work. This two-combined technique detects the text regions and then deciphers the result into a machine-readable format. The proposed method has experimented with three different video datasets such as ICDAR 2015, Road Text 1K, and own video Datasets. Our results proved to be more effective with precision, recall, and F1-Score.
ISSN:0992-499X
1958-5748
DOI:10.18280/ria.360314