Short Text Classification Based on Hybrid Semantic Expansion and Bidirectional GRU (BiGRU) Based Method to Improve Hate Speech Detection

The persistent prevalence of hate speech on contemporary social media platforms demands advanced detection methods to address specific categories and levels of offenses. This research focuses on enhancing hate speech detection by refining text representation through a semantic expansion approach, su...

Full description

Saved in:

Bibliographic Details
Published in	Revue d'Intelligence Artificielle Vol. 37; no. 6; p. 1471
Main Authors	Muzakir, Ari, Kusworo Adi, Kusumaningrum, Retno
Format	Journal Article
Language	English
Published	Edmonton International Information and Engineering Technology Association (IIETA) 01.12.2023
Subjects	Algorithms Artificial neural networks Classification Datasets Hate speech Knowledge Knowledge bases (artificial intelligence) Machine learning Natural language Neural networks Semantics Social networks Sparsity Spelling Text categorization Word sense disambiguation Words (language)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The persistent prevalence of hate speech on contemporary social media platforms demands advanced detection methods to address specific categories and levels of offenses. This research focuses on enhancing hate speech detection by refining text representation through a semantic expansion approach, surpassing the limitations of conventional methods. The back-translation technique is employed to enhance sentence structure. Initially, the Lesk Algorithm is utilized for word disambiguation in the semantic expansion process, identifying word meanings within relevant contexts. Subsequently, knowledge bases from WordNet and Kateglo are leveraged to enrich contextual information. The final step involves using Cosine Similarity to select the most appropriate words based on the highest scores. The combined semantic expansion technique significantly improves classification performance compared to conventional methods. Data, with and without semantic expansion, is vectorized into the BERT embedding space and classified using deep learning models such as CNN, BiGRU, and BiLSTM. The proposed approach consistently demonstrates high accuracy across all model types: CNN (88%), BiGRU (88.3%), and BiLSTM (87.3%). In contrast, models without semantic expansion yield relatively lower results-CNN (83.6%), BiGRU (83.3%), and BiLSTM (83.1%). This underscores the substantial breakthrough of the semantic expansion approach in overcoming challenges related to data distribution and semantic feature scarcity, ultimately resulting in improved classification performance.
ISSN:	0992-499X 1958-5748
DOI:	10.18280/ria.370611