OCR with the Deep CNN Model for Ligature Script-Based Languages like Manchu

Manchu is a low-resource language that is rarely involved in text recognition technology. Because of the combination of typefaces, ordinary text recognition practice requires segmentation before recognition, which affects the recognition accuracy. In this paper, we propose a Manchu text recognition...

Full description

Saved in:
Bibliographic Details
Published inScientific programming Vol. 2021; pp. 1 - 9
Main Authors Zhang, Diandian, Liu, Yan, Wang, Zhuowei, Wang, Depei
Format Journal Article
LanguageEnglish
Published New York Hindawi 01.06.2021
Hindawi Limited
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Manchu is a low-resource language that is rarely involved in text recognition technology. Because of the combination of typefaces, ordinary text recognition practice requires segmentation before recognition, which affects the recognition accuracy. In this paper, we propose a Manchu text recognition system divided into two parts: text recognition and text retrieval. First, a deep CNN model is used for text recognition, using a sliding window instead of manual segmentation. Second, text retrieval finds similarities within the image and locates the position of the recognized text in the database; this process is described in detail. We conducted comparative experiments on the FAST-NU dataset using different quantities of sample data, as well as comparisons with the latest model. The experiments revealed that the optimal results of the proposed deep CNN model reached 98.84%.
ISSN:1058-9244
1875-919X
DOI:10.1155/2021/5520338