Beyond Language Boundaries: Analysis and Ensemble Approach of Hate Speech Detection in South Asian Social Media
Nowadays social media has become the common platform to be connected with many people all over the world. Due to the increased use of social media, sometimes users are exposed to great risk. The anonymity of user in social media is the sole facilitator of spreading hate in a community. Several studi...
Saved in:
Published in | 2023 26th International Conference on Computer and Information Technology (ICCIT) pp. 1 - 6 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
13.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Nowadays social media has become the common platform to be connected with many people all over the world. Due to the increased use of social media, sometimes users are exposed to great risk. The anonymity of user in social media is the sole facilitator of spreading hate in a community. Several studies have been published recently regarding detection of such offensive speech. The majority of these studies deal with English language data due to its availability. The primary goal of this paper is to present a comprehensive approach to hate speech detection, focusing specifically in South Asian languages. In this paper, a comparative analysis of various machine learning classifiers has been conducted on four South Asian languages: Bangla, Indonesian, Urdu and Sinhala. Also an ensemble approach consisting of XGBoost, CatBoost and LightGBM has been used to improve the overall performance resulting in an accuracy of 92% for Bangla, 85% for Indonesian, 90% for Urdu and 85% for Sinhala. Finally, a comparative analysis shows the most effectiveness of MLP (Multi Layer Perceptron) among the 9 classifiers, confirming the potential of our approach for efficient hate detection in mentioned four languages. |
---|---|
DOI: | 10.1109/ICCIT60459.2023.10441513 |