Deep Learning Based Sinhala Optical Character Recognition (OCR)

With the advancement of computer technology during the last few years, researchers have integrated machine learning and deep learning techniques to analyse the textual representations on digital documents. As a result of that, people have tended to integrate Optical Character Recognition (OCR) techn...

Full description

Saved in:

Bibliographic Details
Published in	2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer) pp. 298 - 299
Main Authors	Anuradha, Isuri, Liyanage, Chamila, Wijayawardhana, Harsha, Weerasinghe, Ruvan
Format	Conference Proceeding
Language	English
Published	IEEE 04.11.2020
Subjects	Character recognition Deep learning Engines Integrated optics Linguistics Optical Character Recognition Optical character recognition software Optical imaging Sinhala OCR Tesseract
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the advancement of computer technology during the last few years, researchers have integrated machine learning and deep learning techniques to analyse the textual representations on digital documents. As a result of that, people have tended to integrate Optical Character Recognition (OCR) technology to recognize printed texts into machine operable text for different character sets. Sinhala as an abugida script has its own writing system which is used to write Sinhala and Pali languages. With the complexities of the Sinhala script, it makes hard to develop an OCR system. When considering recent literature, most research groups try to reduce the complex nature of the Sinhala script with the support of computer science and Neural networks [1] , [2] . Tesseract is an open-source, deep-learning based OCR engine developed by Google [3] . Despite decades of research on the engineering aspects, our attempt was taken to improve the accuracy of Sinhala character recognition using deep learning mechanisms.
ISSN:	2472-7598
DOI:	10.1109/ICTer51097.2020.9325428