Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts

In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally...

Full description

Saved in:
Bibliographic Details
Published inMathematics (Basel) Vol. 11; no. 22; p. 4585
Main Authors Park, Youngki, Shin, Youhyun
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets.
ISSN:2227-7390
2227-7390
DOI:10.3390/math11224585