Advancing Arabic Hate Speech Detection via Neural Transfer Learning with BERT

Online hate speech poses grave societal dangers, necessitating automatic detection systems. However, limited efforts have focused on Arabic's complex dialects. This study investigates neural transfer learning for dialect Arabic hate speech detection. A public Levantine Twitter dataset with 5,84...

Full description

Saved in:
Bibliographic Details
Published in2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON) pp. 1 - 8
Main Authors Naji, Ezzaldeen Mahyoub, Maslekar, Ajit A, Ahmed, Zeyad A. T., Alharbi, Alhasan, Al-sellami, Belal, Tawfik, Mohammed
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Online hate speech poses grave societal dangers, necessitating automatic detection systems. However, limited efforts have focused on Arabic's complex dialects. This study investigates neural transfer learning for dialect Arabic hate speech detection. A public Levantine Twitter dataset with 5,846 expert annotated tweets is compiled spanning normal, offensive, and hate speech. Traditional machine learning models including SVM, logistic regression, and gradient boosting are benchmarked, achieving 80-83% accuracy. However, these models fail to capture nuanced contextual differences between offensive and hateful language. To address this, transfer learning is proposed using the pretrained Arabic BERT model ArabERT. ArabERT leverages BERT's bidirectional representations to model linguistic context. ArabERT is fine-tuned on the Levantine dataset for hate speech classification. Results show ArabERT significantly outperforms machine learning models, attaining 90% accuracy and 94% Fl-score specifically for hate speech detection. Detailed analysis demonstrates ArabERT's contextual modeling enables nuanced discernment between offensive and hateful tweets. The outcomes provide strong evidence that transfer learning approaches like ArabERT are crucial for handling informal multi-dialect Arabic. This work makes three key contributions - introducing an effective neural framework for Arabic hate speech detection, rigorous benchmarking, and providing insights into deep learning strategies. The findings showcase transfer learning's efficacy for low-resource Arabic NLP and pave promising directions for future progress.
DOI:10.1109/SMARTGENCON60755.2023.10441885