BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental health of the whole internet society and may lead to hate crimes. Intelligent models for automatic detection of offensive language and hate spee...
Saved in:
Published in | International journal of advanced computer science & applications Vol. 13; no. 5 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
West Yorkshire
Science and Information (SAI) Organization Limited
01.01.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental health of the whole internet society and may lead to hate crimes. Intelligent models for automatic detection of offensive language and hate speech have attracted significant attention recently. In this paper, we propose an automatic method for detecting offensive language and fine-grained hate speech from Arabic tweets. We compare between BERT and two conventional machine learning techniques (SVM, logistic regression). We also investigate the use of sentiment analysis and emojis descriptions as appending features along with the textual content of the tweets. The experiments shows that BERT-based model gives the best results, surpassing the best benchmark systems in the literature, on all three tasks:(a) offensive language detection with 84.3% F1-score, (b) hate speech detection with 81.8% F1-score, and (c) fine-grained hatespeech recognition (e.g., race, religion, social class, etc.) with 45.1% F1-score. The use of sentiment analysis slightly improves the performance of the models when detecting offensive language and hate speech but has no positive effect on the performance of the models when recognising the type of the hate speech. The use of textual emoji description as features can improve or deteriorate the performance of the models depending on the size of the examples per class and whether the emojis are considered among distinctive features between classes or not. |
---|---|
AbstractList | The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental health of the whole internet society and may lead to hate crimes. Intelligent models for automatic detection of offensive language and hate speech have attracted significant attention recently. In this paper, we propose an automatic method for detecting offensive language and fine-grained hate speech from Arabic tweets. We compare between BERT and two conventional machine learning techniques (SVM, logistic regression). We also investigate the use of sentiment analysis and emojis descriptions as appending features along with the textual content of the tweets. The experiments shows that BERT-based model gives the best results, surpassing the best benchmark systems in the literature, on all three tasks:(a) offensive language detection with 84.3% F1-score, (b) hate speech detection with 81.8% F1-score, and (c) fine-grained hatespeech recognition (e.g., race, religion, social class, etc.) with 45.1% F1-score. The use of sentiment analysis slightly improves the performance of the models when detecting offensive language and hate speech but has no positive effect on the performance of the models when recognising the type of the hate speech. The use of textual emoji description as features can improve or deteriorate the performance of the models depending on the size of the examples per class and whether the emojis are considered among distinctive features between classes or not. |
Author | Althobaiti, Maha Jarallah |
Author_xml | – sequence: 1 givenname: Maha Jarallah surname: Althobaiti fullname: Althobaiti, Maha Jarallah |
BookMark | eNqFkMFOwzAMhiM0JMbYIyBF4tyRpE2bwqmMwoYmTWI9cKuSLh2ZurQkGbATr062wYULlmVbln_L_s5BT7daAnCJ0QhHNE6vp0_ZeJGNCCJkhHCIKEbpCegTTOOA0gT1DjULMEpezsDQ2jXyFqYkZmEffN3lz0UguJVLmHWdaXn1Cl0LM8OFquCEOwkXnZS-y_USzutaaqveJZxxvdrylYT30snKqVZDpWHxoZyT5gbmn13TKqf0Cuabdq3sQb6Q2qmNDzDTvNlZZS_Aac0bK4c_eQCKh7wYT4LZ_HE6zmZBFZLIBYxVTBBSCxKxZUpYXeEIhwkTLCQkEYlAgkq6xDzyLihL65DFtE4TjhJGSTgAV8e1_sO3rbSuXLdb42-wpeeA4wj74Kduj1OVaa01si4r5fj-N2e4akqMygPz8si83DMvf5l7Nf2j7ozacLP7R_cNOOyGnQ |
CitedBy_id | crossref_primary_10_7717_peerj_cs_1966 crossref_primary_10_1007_s42044_025_00247_7 crossref_primary_10_1515_lpp_2024_0034 crossref_primary_10_3389_frai_2024_1345445 crossref_primary_10_5715_jnlp_31_1598 crossref_primary_10_1145_3677176 crossref_primary_10_7717_peerj_cs_1617 |
ContentType | Journal Article |
Copyright | 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 3V. 7T9 7XB 8FE 8FG 8FK 8G5 ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ GUQSH HCIFZ JQ2 K7- M2O MBDVC P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
DOI | 10.14569/IJACSA.2022.01305109 |
DatabaseName | CrossRef ProQuest Central (Corporate) Linguistics and Language Behavior Abstracts (LLBA) ProQuest Central (purchase pre-March 2016) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student Research Library Prep SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Proquest Research Library Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic |
DatabaseTitle | CrossRef Publicly Available Content Database Research Library Prep Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Research Library ProQuest Central (New) Advanced Technologies & Aerospace Collection ProQuest Central Basic ProQuest One Academic Eastern Edition Linguistics and Language Behavior Abstracts (LLBA) ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science Religion |
EISSN | 2156-5570 |
ExternalDocumentID | 10_14569_IJACSA_2022_01305109 |
GroupedDBID | .DC 5VS 8G5 AAYXX ABUWG ADMLS AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS AZQEC BENPR BGLVJ CCPQU CITATION DWQXO EBS EJD GNUQQ GUQSH HCIFZ K7- KQ8 M2O OK1 PHGZM PHGZT PIMPY RNS 3V. 7T9 7XB 8FE 8FG 8FK JQ2 MBDVC P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
ID | FETCH-LOGICAL-c324t-88c8b22fb248d928fc141378b83227b7b0b5e5d1a41a4b589f3865f97a078523 |
IEDL.DBID | BENPR |
ISSN | 2158-107X |
IngestDate | Mon Jul 14 09:57:37 EDT 2025 Thu Apr 24 22:51:17 EDT 2025 Tue Jul 01 01:10:10 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 5 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c324t-88c8b22fb248d928fc141378b83227b7b0b5e5d1a41a4b589f3865f97a078523 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
OpenAccessLink | https://www.proquest.com/docview/2681641816?pq-origsite=%requestingapplication% |
PQID | 2681641816 |
PQPubID | 5444811 |
ParticipantIDs | proquest_journals_2681641816 crossref_citationtrail_10_14569_IJACSA_2022_01305109 crossref_primary_10_14569_IJACSA_2022_01305109 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20220101 |
PublicationDateYYYYMMDD | 2022-01-01 |
PublicationDate_xml | – month: 01 year: 2022 text: 20220101 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | West Yorkshire |
PublicationPlace_xml | – name: West Yorkshire |
PublicationTitle | International journal of advanced computer science & applications |
PublicationYear | 2022 |
Publisher | Science and Information (SAI) Organization Limited |
Publisher_xml | – name: Science and Information (SAI) Organization Limited |
SSID | ssj0000392683 |
Score | 2.2752998 |
Snippet | The user-generated content on the internet including that on social media may contain offensive language and hate speech which negatively affect the mental... |
SourceID | proquest crossref |
SourceType | Aggregation Database Enrichment Source Index Database |
SubjectTerms | Arabic language Data mining Deep learning Distinctive features Emojis Emotional icons Hate speech Internet Language Machine learning Mental health Performance enhancement Religion Sentiment analysis Social classes Social media Speech recognition User generated content |
Title | BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis |
URI | https://www.proquest.com/docview/2681641816 |
Volume | 13 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1NT9wwEB2VRap6Abptxbd86NW7JHFihwtaYLdb1G4rdivtLYodRwqC7MIGceSvM5M4fFyolFOi8SFjzzyPn98AfBcBYorMptxIbbmIPMO1jXweWxFaE4ooC-g28u9JNP4nLubh3BXcVo5W2cbEOlBnC0M18r4fKUT2mI-ik-Utp65RdLrqWmiswTqGYKU6sH46nPy9fK6yHGH6j2otTkxtpGMq5-4aDwKHuP_zYnA2HeAu0fd7dIQX1sTE1wnqbXyuk85oCzYcWmSDxr2f4YMtu7DZdmJgbmF24WPLLP4Cj6fDyxmn5JSxgRMMZ9UCx0h1YdgYsSWbLq3Ft2mZsT953lDY2S9XuWTntqr5WSUrSjZ7KOi-zzGryXoFkaTZ8GZxVaxq8ymRjajAyFp1k68wGw1nZ2Puuixwg2Cq4koZpX0_175QWeyr3HiY2KTStNallvpIhzbMvFTgo0MV59QmNI9liugCt7HfoFMuSrsNzDOBQfQVkCi8yESqhZDW8_JAezqIpNwB0f7ZxDgFcmqEcZ3QToQckjQOScghSeuQHeg9my0bCY7_Gey3bkvcilwlL_Nn9_3Pe_CJRmvKLPvQqe7u7QECj0ofwpoa_Th0c-wJ9VTTrw |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB1VRQIuFAqI0g98gKPbxnHiBKlCS7vLbrstEhukvVmx40hBkF3YoIpTfxH_sTNJ3MIFTpVySmQfMpN5z5OZNwCvZYiconA5t8o4LuPAcuNiwVMnI2cjGRchdSOfX8Tjz_J0Hs3X4LfvhaGySh8T20BdLCzlyA9EnCCzRzyK3y2_c5oaRX9X_QiNzi3O3K9LPLKtjiYnaN83QoyG2fGY91MFuEXy0PAksYkRojRCJkUqktIGGMhVYsi3lVHm0EQuKoJc4mWiJC1pLGaZqhzRNCKdA4z492SIQE6N6aMPNymdQ-QacSv8iThKoqlq3vcMIUtJDyang-PZAI-kQuzT_8KorYL8Ew3_BoMW4UaP4VFPTdmg86UnsObqTdjwYx9YHwU24b4vY34KV--HnzJOSFiwQa9OzpoF7pGbyrIxElk2WzqHd_O6YB_LsquXZ9M-TcpOXNMWg9Wsqll2WVFz0VvWVgZWVJHNht8WX6pVu3xGlU2UzWReSuUZZHfx8p_Der2o3QtggQ0tUr2QFOhlIXMjpXJBUIYmMGGs1BZI_2a17eXOaerGV03HHjKI7gyiySDaG2QL9m-WLTu9j_8t2PFm0_3nv9K3zvry349fwYNxdj7V08nF2TY8pJ27_M4OrDc_frpdZDyN2Wv9jIG-Y7--Bu-nC_Q |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BERT-based+Approach+to+Arabic+Hate+Speech+and+Offensive+Language+Detection+in+Twitter%3A+Exploiting+Emojis+and+Sentiment+Analysis&rft.jtitle=International+journal+of+advanced+computer+science+%26+applications&rft.au=Althobaiti%2C+Maha+Jarallah&rft.date=2022-01-01&rft.issn=2158-107X&rft.eissn=2156-5570&rft.volume=13&rft.issue=5&rft_id=info:doi/10.14569%2FIJACSA.2022.01305109&rft.externalDBID=n%2Fa&rft.externalDocID=10_14569_IJACSA_2022_01305109 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-107X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-107X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-107X&client=summon |