Character recognition system for pegon typed manuscript

The Pegon script is an Arabic-based writing system used for Javanese, Sundanese, Madurese, and Indonesian languages. Due to various reasons, this script is now mainly found among collectors and private Islamic boarding schools (pesantren), creating a need for its preservation. One preservation metho...

Full description

Saved in:

Bibliographic Details
Published in	Heliyon Vol. 10; no. 16; p. e35959
Main Authors	Ruldeviyani, Yova, Suhartanto, Heru, Sotardodo, Beltsazar Anugrah, Fahreza, Muhammad Hanif, Septiano, Andre, Rachmadi, Muhammad Febrian
Format	Journal Article
Language	English
Published	England Elsevier Ltd 30.08.2024 Elsevier
Subjects	Arabic Character recognition Deep learning Pegon Segmentation Deep learning Arabic Segmentation Character recognition Pegon
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The Pegon script is an Arabic-based writing system used for Javanese, Sundanese, Madurese, and Indonesian languages. Due to various reasons, this script is now mainly found among collectors and private Islamic boarding schools (pesantren), creating a need for its preservation. One preservation method is digitization through transcription into machine-encoded text, known as OCR (Optical Character Recognition). No published literature exists on OCR systems for this specific script. This research explores the OCR of Pegon typed manuscripts, introducing novel synthesized and real annotated datasets for this task. These datasets evaluate proposed OCR methods, especially those adapted from existing Arabic OCR systems. Results show that deep learning techniques outperform conventional ones, which fail to detect Pegon text. The proposed system uses YOLOv5 for line segmentation and a CTC-CRNN architecture for line text recognition, achieving an F1-score of 0.94 for segmentation and a CER of 0.03 for recognition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2405-8440 2405-8440
DOI:	10.1016/j.heliyon.2024.e35959