BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis

The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental health of the whole internet society and may lead to hate crimes. Intelligent models for automatic detection of offensive language and hate spee...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of advanced computer science & applications Vol. 13; no. 5
Main Author Althobaiti, Maha Jarallah
Format Journal Article
LanguageEnglish
Published West Yorkshire Science and Information (SAI) Organization Limited 01.01.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental health of the whole internet society and may lead to hate crimes. Intelligent models for automatic detection of offensive language and hate speech have attracted significant attention recently. In this paper, we propose an automatic method for detecting offensive language and fine-grained hate speech from Arabic tweets. We compare between BERT and two conventional machine learning techniques (SVM, logistic regression). We also investigate the use of sentiment analysis and emojis descriptions as appending features along with the textual content of the tweets. The experiments shows that BERT-based model gives the best results, surpassing the best benchmark systems in the literature, on all three tasks:(a) offensive language detection with 84.3% F1-score, (b) hate speech detection with 81.8% F1-score, and (c) fine-grained hatespeech recognition (e.g., race, religion, social class, etc.) with 45.1% F1-score. The use of sentiment analysis slightly improves the performance of the models when detecting offensive language and hate speech but has no positive effect on the performance of the models when recognising the type of the hate speech. The use of textual emoji description as features can improve or deteriorate the performance of the models depending on the size of the examples per class and whether the emojis are considered among distinctive features between classes or not.
AbstractList The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental health of the whole internet society and may lead to hate crimes. Intelligent models for automatic detection of offensive language and hate speech have attracted significant attention recently. In this paper, we propose an automatic method for detecting offensive language and fine-grained hate speech from Arabic tweets. We compare between BERT and two conventional machine learning techniques (SVM, logistic regression). We also investigate the use of sentiment analysis and emojis descriptions as appending features along with the textual content of the tweets. The experiments shows that BERT-based model gives the best results, surpassing the best benchmark systems in the literature, on all three tasks:(a) offensive language detection with 84.3% F1-score, (b) hate speech detection with 81.8% F1-score, and (c) fine-grained hatespeech recognition (e.g., race, religion, social class, etc.) with 45.1% F1-score. The use of sentiment analysis slightly improves the performance of the models when detecting offensive language and hate speech but has no positive effect on the performance of the models when recognising the type of the hate speech. The use of textual emoji description as features can improve or deteriorate the performance of the models depending on the size of the examples per class and whether the emojis are considered among distinctive features between classes or not.
Author Althobaiti, Maha Jarallah
Author_xml – sequence: 1
  givenname: Maha Jarallah
  surname: Althobaiti
  fullname: Althobaiti, Maha Jarallah
BookMark eNqFkMFOwzAMhiM0JMbYIyBF4tyRpE2bwqmMwoYmTWI9cKuSLh2ZurQkGbATr062wYULlmVbln_L_s5BT7daAnCJ0QhHNE6vp0_ZeJGNCCJkhHCIKEbpCegTTOOA0gT1DjULMEpezsDQ2jXyFqYkZmEffN3lz0UguJVLmHWdaXn1Cl0LM8OFquCEOwkXnZS-y_USzutaaqveJZxxvdrylYT30snKqVZDpWHxoZyT5gbmn13TKqf0Cuabdq3sQb6Q2qmNDzDTvNlZZS_Aac0bK4c_eQCKh7wYT4LZ_HE6zmZBFZLIBYxVTBBSCxKxZUpYXeEIhwkTLCQkEYlAgkq6xDzyLihL65DFtE4TjhJGSTgAV8e1_sO3rbSuXLdb42-wpeeA4wj74Kduj1OVaa01si4r5fj-N2e4akqMygPz8si83DMvf5l7Nf2j7ozacLP7R_cNOOyGnQ
CitedBy_id crossref_primary_10_7717_peerj_cs_1966
crossref_primary_10_1007_s42044_025_00247_7
crossref_primary_10_1515_lpp_2024_0034
crossref_primary_10_3389_frai_2024_1345445
crossref_primary_10_5715_jnlp_31_1598
crossref_primary_10_1145_3677176
crossref_primary_10_7717_peerj_cs_1617
ContentType Journal Article
Copyright 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
3V.
7T9
7XB
8FE
8FG
8FK
8G5
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
GUQSH
HCIFZ
JQ2
K7-
M2O
MBDVC
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.14569/IJACSA.2022.01305109
DatabaseName CrossRef
ProQuest Central (Corporate)
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Central (purchase pre-March 2016)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Research Library
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
Research Library Prep
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Proquest Research Library
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Publicly Available Content Database
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Research Library
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
ProQuest Central Basic
ProQuest One Academic Eastern Edition
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Religion
EISSN 2156-5570
ExternalDocumentID 10_14569_IJACSA_2022_01305109
GroupedDBID .DC
5VS
8G5
AAYXX
ABUWG
ADMLS
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CITATION
DWQXO
EBS
EJD
GNUQQ
GUQSH
HCIFZ
K7-
KQ8
M2O
OK1
PHGZM
PHGZT
PIMPY
RNS
3V.
7T9
7XB
8FE
8FG
8FK
JQ2
MBDVC
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c324t-88c8b22fb248d928fc141378b83227b7b0b5e5d1a41a4b589f3865f97a078523
IEDL.DBID BENPR
ISSN 2158-107X
IngestDate Mon Jul 14 09:57:37 EDT 2025
Thu Apr 24 22:51:17 EDT 2025
Tue Jul 01 01:10:10 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c324t-88c8b22fb248d928fc141378b83227b7b0b5e5d1a41a4b589f3865f97a078523
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2681641816?pq-origsite=%requestingapplication%
PQID 2681641816
PQPubID 5444811
ParticipantIDs proquest_journals_2681641816
crossref_citationtrail_10_14569_IJACSA_2022_01305109
crossref_primary_10_14569_IJACSA_2022_01305109
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 20220101
  day: 01
PublicationDecade 2020
PublicationPlace West Yorkshire
PublicationPlace_xml – name: West Yorkshire
PublicationTitle International journal of advanced computer science & applications
PublicationYear 2022
Publisher Science and Information (SAI) Organization Limited
Publisher_xml – name: Science and Information (SAI) Organization Limited
SSID ssj0000392683
Score 2.2752998
Snippet The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental...
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
SubjectTerms Arabic language
Data mining
Deep learning
Distinctive features
Emojis
Emotional icons
Hate speech
Internet
Language
Machine learning
Mental health
Performance enhancement
Religion
Sentiment analysis
Social classes
Social media
Speech recognition
User generated content
Title BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
URI https://www.proquest.com/docview/2681641816
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1NT9wwEB2VRap6Abptxbd86NW7JHFihwtaYLdb1G4rdivtLYodRwqC7MIGceSvM5M4fFyolFOi8SFjzzyPn98AfBcBYorMptxIbbmIPMO1jXweWxFaE4ooC-g28u9JNP4nLubh3BXcVo5W2cbEOlBnC0M18r4fKUT2mI-ik-Utp65RdLrqWmiswTqGYKU6sH46nPy9fK6yHGH6j2otTkxtpGMq5-4aDwKHuP_zYnA2HeAu0fd7dIQX1sTE1wnqbXyuk85oCzYcWmSDxr2f4YMtu7DZdmJgbmF24WPLLP4Cj6fDyxmn5JSxgRMMZ9UCx0h1YdgYsSWbLq3Ft2mZsT953lDY2S9XuWTntqr5WSUrSjZ7KOi-zzGryXoFkaTZ8GZxVaxq8ymRjajAyFp1k68wGw1nZ2Puuixwg2Cq4koZpX0_175QWeyr3HiY2KTStNallvpIhzbMvFTgo0MV59QmNI9liugCt7HfoFMuSrsNzDOBQfQVkCi8yESqhZDW8_JAezqIpNwB0f7ZxDgFcmqEcZ3QToQckjQOScghSeuQHeg9my0bCY7_Gey3bkvcilwlL_Nn9_3Pe_CJRmvKLPvQqe7u7QECj0ofwpoa_Th0c-wJ9VTTrw
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB1VRQIuFAqI0g98gKPbxnHiBKlCS7vLbrstEhukvVmx40hBkF3YoIpTfxH_sTNJ3MIFTpVySmQfMpN5z5OZNwCvZYiconA5t8o4LuPAcuNiwVMnI2cjGRchdSOfX8Tjz_J0Hs3X4LfvhaGySh8T20BdLCzlyA9EnCCzRzyK3y2_c5oaRX9X_QiNzi3O3K9LPLKtjiYnaN83QoyG2fGY91MFuEXy0PAksYkRojRCJkUqktIGGMhVYsi3lVHm0EQuKoJc4mWiJC1pLGaZqhzRNCKdA4z492SIQE6N6aMPNymdQ-QacSv8iThKoqlq3vcMIUtJDyang-PZAI-kQuzT_8KorYL8Ew3_BoMW4UaP4VFPTdmg86UnsObqTdjwYx9YHwU24b4vY34KV--HnzJOSFiwQa9OzpoF7pGbyrIxElk2WzqHd_O6YB_LsquXZ9M-TcpOXNMWg9Wsqll2WVFz0VvWVgZWVJHNht8WX6pVu3xGlU2UzWReSuUZZHfx8p_Der2o3QtggQ0tUr2QFOhlIXMjpXJBUIYmMGGs1BZI_2a17eXOaerGV03HHjKI7gyiySDaG2QL9m-WLTu9j_8t2PFm0_3nv9K3zvry349fwYNxdj7V08nF2TY8pJ27_M4OrDc_frpdZDyN2Wv9jIG-Y7--Bu-nC_Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BERT-based+Approach+to+Arabic+Hate+Speech+and+Offensive+Language+Detection+in+Twitter%3A+Exploiting+Emojis+and+Sentiment+Analysis&rft.jtitle=International+journal+of+advanced+computer+science+%26+applications&rft.au=Althobaiti%2C+Maha+Jarallah&rft.date=2022-01-01&rft.issn=2158-107X&rft.eissn=2156-5570&rft.volume=13&rft.issue=5&rft_id=info:doi/10.14569%2FIJACSA.2022.01305109&rft.externalDBID=n%2Fa&rft.externalDocID=10_14569_IJACSA_2022_01305109
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-107X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-107X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-107X&client=summon