Deep Learning Based Sinhala Optical Character Recognition (OCR)

With the advancement of computer technology during the last few years, researchers have integrated machine learning and deep learning techniques to analyse the textual representations on digital documents. As a result of that, people have tended to integrate Optical Character Recognition (OCR) techn...

Full description

Saved in:
Bibliographic Details
Published in2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer) pp. 298 - 299
Main Authors Anuradha, Isuri, Liyanage, Chamila, Wijayawardhana, Harsha, Weerasinghe, Ruvan
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.11.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the advancement of computer technology during the last few years, researchers have integrated machine learning and deep learning techniques to analyse the textual representations on digital documents. As a result of that, people have tended to integrate Optical Character Recognition (OCR) technology to recognize printed texts into machine operable text for different character sets. Sinhala as an abugida script has its own writing system which is used to write Sinhala and Pali languages. With the complexities of the Sinhala script, it makes hard to develop an OCR system. When considering recent literature, most research groups try to reduce the complex nature of the Sinhala script with the support of computer science and Neural networks [1] , [2] . Tesseract is an open-source, deep-learning based OCR engine developed by Google [3] . Despite decades of research on the engineering aspects, our attempt was taken to improve the accuracy of Sinhala character recognition using deep learning mechanisms.
ISSN:2472-7598
DOI:10.1109/ICTer51097.2020.9325428