Segmentation of Bangla Compound Characters: Underlying Simple Character Detection from Handwritten Compound Characters Using YOLOv8

Bangla is one of the most popular languages in the world and more than 210 Million people use it as their first or second language. The literature of Bangla has a rich history and dates back thousands of years. However, some of the Bangla characters have a compound structure, where the character is...

Full description

Saved in:

Bibliographic Details
Published in	2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT) pp. 1 - 6
Main Authors	Islam Bhuiyan, Md Raihanul, Efaz, Mahin Shahriar, Reza, Tanjim, Ria, Aditi Saha, Reza, Md. Tanzim, Hossain, Muhammad Iqbal
Format	Conference Proceeding
Language	English
Published	IEEE 02.05.2024
Subjects	Bangla Text Recognition Communications technology Compound Characters Compounds Handwriting recognition Image segmentation Object detection Text recognition Training YOLOv8 models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Bangla is one of the most popular languages in the world and more than 210 Million people use it as their first or second language. The literature of Bangla has a rich history and dates back thousands of years. However, some of the Bangla characters have a compound structure, where the character is made up of more than one simple character. Although there is a lot of work on character recognition in Bangla, the structure of the compound characters makes the overall detection a difficult task. The existing method of compound character detection uses a list of compound characters as the dataset, trains models on the whole image, and detects the characters. Using this method may result in insufficient accuracy since an enormous amount of character combinations may happen and it's often not possible to introduce them in the training set. To overcome this problem, our research focuses on detecting character type i.e. simple or compound using the YOLOv8 model, and if it is a compound character, it detects the underlying simple characters inside the compound characters. To conduct our research, we created a new Bengali Handwritten character dataset called "BanglaBorno" as the existing datasets had some limitations in the quantity of compound characters or the quality of the images. YOLOv8 classification model gave 81.8% and YOLOv8 object detection model gave 82.44% accuracy on the overall dataset.
ISSN:	2769-5700
DOI:	10.1109/ICEEICT62016.2024.10534334