Tulu Manuscript OCR: Preserving Ancient Wisdom through Character Recognition
Tulu, largely spoken in coastal Karnataka, has a distinct alphabet that used to be written on palm leaves. This study addresses the scarcity of efficient OCR solutions. Employing machine learning algorithms that include decision tree, k-nearest neighbors (KNN), and random forest. The system achieves...
Saved in:
Published in | 2024 Second International Conference on Data Science and Information System (ICDSIS) pp. 1 - 7 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
17.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Tulu, largely spoken in coastal Karnataka, has a distinct alphabet that used to be written on palm leaves. This study addresses the scarcity of efficient OCR solutions. Employing machine learning algorithms that include decision tree, k-nearest neighbors (KNN), and random forest. The system achieves its highest accuracy of 92.35 \% with the random forest algorithm. The system's versatility in handling diverse font styles and sizes is crucial for Tulu character recognition. The inclusion of a classifier-level fusion strategy enhances recognition accuracy, which is vital given the intricate nature of Tulu characters. This research advances OCR technology for Indian languages, specifically meeting the unique needs of the Tulu script. The effectiveness of the random forest algorithm, achieving high accuracy, underscores its potential for broader applications. The proposed Tulu Character Recognition System represents a pivotal step in addressing the OCR gap for Indian languages, holding promise for future linguistic technology advancements. |
---|---|
DOI: | 10.1109/ICDSIS61070.2024.10594489 |