Short Text Classification Based on Hybrid Semantic Expansion and Bidirectional GRU (BiGRU) Based Method to Improve Hate Speech Detection
The persistent prevalence of hate speech on contemporary social media platforms demands advanced detection methods to address specific categories and levels of offenses. This research focuses on enhancing hate speech detection by refining text representation through a semantic expansion approach, su...
Saved in:
Published in | Revue d'Intelligence Artificielle Vol. 37; no. 6; p. 1471 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Edmonton
International Information and Engineering Technology Association (IIETA)
01.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The persistent prevalence of hate speech on contemporary social media platforms demands advanced detection methods to address specific categories and levels of offenses. This research focuses on enhancing hate speech detection by refining text representation through a semantic expansion approach, surpassing the limitations of conventional methods. The back-translation technique is employed to enhance sentence structure. Initially, the Lesk Algorithm is utilized for word disambiguation in the semantic expansion process, identifying word meanings within relevant contexts. Subsequently, knowledge bases from WordNet and Kateglo are leveraged to enrich contextual information. The final step involves using Cosine Similarity to select the most appropriate words based on the highest scores. The combined semantic expansion technique significantly improves classification performance compared to conventional methods. Data, with and without semantic expansion, is vectorized into the BERT embedding space and classified using deep learning models such as CNN, BiGRU, and BiLSTM. The proposed approach consistently demonstrates high accuracy across all model types: CNN (88%), BiGRU (88.3%), and BiLSTM (87.3%). In contrast, models without semantic expansion yield relatively lower results-CNN (83.6%), BiGRU (83.3%), and BiLSTM (83.1%). This underscores the substantial breakthrough of the semantic expansion approach in overcoming challenges related to data distribution and semantic feature scarcity, ultimately resulting in improved classification performance. |
---|---|
ISSN: | 0992-499X 1958-5748 |
DOI: | 10.18280/ria.370611 |