Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques

The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage t...

Full description

Saved in:
Bibliographic Details
Published inApplied artificial intelligence Vol. 39; no. 1
Main Authors Hu, Ya-Han, Liu, Ting-Hsuan, Tsai, Chih-Fong, Lin, Yu-Jung
Format Journal Article
LanguageEnglish
Published Taylor & Francis Group 31.12.2025
Online AccessGet full text

Cover

Loading…
Abstract The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection.
AbstractList The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection.
Author Lin, Yu-Jung
Hu, Ya-Han
Liu, Ting-Hsuan
Tsai, Chih-Fong
Author_xml – sequence: 1
  givenname: Ya-Han
  surname: Hu
  fullname: Hu, Ya-Han
– sequence: 2
  givenname: Ting-Hsuan
  orcidid: 0009-0001-3790-0980
  surname: Liu
  fullname: Liu, Ting-Hsuan
– sequence: 3
  givenname: Chih-Fong
  surname: Tsai
  fullname: Tsai, Chih-Fong
– sequence: 4
  givenname: Yu-Jung
  surname: Lin
  fullname: Lin, Yu-Jung
BookMark eNo9kN1OwkAQRjcGExF9BJN9geJu96fbSwMoJCRciBdebabbKZS0W-xWjW9vC-jVJN98OTM5t2TkG4-EPHA25cywR2aMSBWX05jFahpLbZSQV2TcL5NIK6lGZDx0oqF0Q25DODDGeJLwMXlfgs-r0u_orIIQ6KrOoALvMKdz6ICWnr5C6yDUdI4duq5sPP0uuz1d-IB1ViHdfGEboD6eKFt0e19-fGK4I9cFVAHvL3NC3p4X29kyWm9eVrOndeQE112EhcogNsopkzgO2mjgJhUMc1GIRMaCMXBxootcZTmqzKSuf86pQueZklqKCVmduXkDB3tsyxraH9tAaU9B0-4stF3pKrQsy7iOkUvURkrJUg4uBdbfYlK6QvQsdWa5tgmhxeKfx5kdXNs_13ZwbS-uxS_3KHPh
Cites_doi 10.1109/ACCESS.2021.3102399
10.1007/11538059_91
10.1016/j.dss.2013.08.002
10.1016/j.ipm.2020.102262
10.1007/978-3-319-98074-4
10.1016/j.asoc.2021.107378
10.1002/widm.1249
10.1613/jair.953
10.18653/v1/W16-0425
10.4018/978-1-5225-4999-4.ch002
10.1609/icwsm.v4i1.14018
10.18653/v1/2022.semeval-1.111
10.1007/s11042-020-09138-4
10.1038/nbt1206-1565
10.1145/3124420
10.18653/v1/P16-3016
10.1016/j.neunet.2018.07.011
10.1007/s11042-018-6445-z
10.1016/j.ins.2019.11.004
10.1186/s40537-018-0151-6
10.3390/app8050815
10.18653/v1/D13-1066
10.3390/e23040394
10.1007/s12559-021-09821-0
10.3115/v1/D14-1162
10.1109/TCYB.2016.2579658
10.1145/2939672.2939785
10.1016/j.asoc.2020.106198
10.1007/978-3-319-08010-9_49
10.1016/j.asoc.2019.105662
10.1145/2684822.2685316
10.1007/s12559-016-9415-7
10.4324/9781410616685
10.1007/978-3-642-01307-2_43
10.1016/j.eswa.2020.114041
10.1111/j.2517-6161.1958.tb00292.x
10.1109/TSMCC.2011.2161285
10.1109/ACCESS.2018.2817572
10.1007/s13748-016-0094-0
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.1080/08839514.2025.2468534
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1087-6545
ExternalDocumentID oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3
10_1080_08839514_2025_2468534
GroupedDBID .4S
.7F
.DC
.QJ
0YH
23M
2DF
30N
4.4
5GY
5VS
8VB
AAENE
AAFWJ
AAJMT
AAYXX
ABCCY
ABDBF
ABFIM
ABHAV
ABIVO
ABPEM
ABTAI
ACGEJ
ACGFS
ACGOD
ACNCT
ACTIO
ACUHS
ADCVX
ADMLS
ADXPE
AEISY
AENEX
AEOZL
AEPSL
AEYOC
AFKVX
AGMYJ
AHQJS
AIJEM
AIYEW
AJWEG
AKVCP
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AQRUH
ARCSS
AVBZW
BLEHA
CCCUG
CE4
CITATION
CS3
DKSSO
EAP
EBR
EBS
EBU
ECS
EDO
EMK
EPL
EST
ESX
E~A
E~B
F5P
GROUPED_DOAJ
GTTXZ
H13
HF~
HZ~
H~9
H~P
I-F
J.P
KYCEM
LJTGL
M4Z
MK~
NA5
O9-
P2P
PQQKQ
QWB
RIG
S-T
SNACF
TDBHL
TFL
TFW
TH9
TNC
TTHFI
TUS
TWF
UT5
UU3
ZL0
~S~
ID FETCH-LOGICAL-c316t-ef5ba285c587c1a686a18930ed3f3742300ac276fd5bde5b89ccedc5f6db54643
IEDL.DBID DOA
ISSN 0883-9514
IngestDate Wed Aug 27 01:31:09 EDT 2025
Sun Jul 06 05:04:22 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-ef5ba285c587c1a686a18930ed3f3742300ac276fd5bde5b89ccedc5f6db54643
ORCID 0009-0001-3790-0980
OpenAccessLink https://doaj.org/article/0bb162e14e68444091ac9a00ed044cf3
ParticipantIDs doaj_primary_oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3
crossref_primary_10_1080_08839514_2025_2468534
PublicationCentury 2000
PublicationDate 2025-12-31
PublicationDateYYYYMMDD 2025-12-31
PublicationDate_xml – month: 12
  year: 2025
  text: 2025-12-31
  day: 31
PublicationDecade 2020
PublicationTitle Applied artificial intelligence
PublicationYear 2025
Publisher Taylor & Francis Group
Publisher_xml – name: Taylor & Francis Group
References e_1_3_4_3_1
He H. (e_1_3_4_25_1) 2008
e_1_3_4_42_1
e_1_3_4_7_1
e_1_3_4_5_1
e_1_3_4_23_1
e_1_3_4_46_1
e_1_3_4_21_1
e_1_3_4_27_1
e_1_3_4_48_1
e_1_3_4_29_1
Zhang M. (e_1_3_4_52_1) 2016
Kenton J. D. M. W. C. (e_1_3_4_31_1) 2019
Maynard D. G. (e_1_3_4_40_1) 2014
e_1_3_4_30_1
e_1_3_4_51_1
e_1_3_4_13_1
e_1_3_4_34_1
e_1_3_4_11_1
e_1_3_4_32_1
e_1_3_4_17_1
e_1_3_4_38_1
e_1_3_4_15_1
e_1_3_4_36_1
e_1_3_4_19_1
e_1_3_4_2_1
e_1_3_4_8_1
e_1_3_4_20_1
e_1_3_4_41_1
e_1_3_4_6_1
e_1_3_4_45_1
e_1_3_4_43_1
Hazarika D. (e_1_3_4_24_1) 2018
e_1_3_4_28_1
Ptáček T. (e_1_3_4_44_1) 2014
e_1_3_4_49_1
e_1_3_4_47_1
Amir S. (e_1_3_4_4_1) 2016
Chawla N. V. (e_1_3_4_9_1) 2010
González-Ibánez R. (e_1_3_4_22_1) 2011
e_1_3_4_50_1
e_1_3_4_12_1
e_1_3_4_35_1
e_1_3_4_10_1
e_1_3_4_33_1
e_1_3_4_16_1
e_1_3_4_39_1
e_1_3_4_14_1
e_1_3_4_37_1
e_1_3_4_18_1
Hercig T. (e_1_3_4_26_1) 2017
References_xml – ident: e_1_3_4_33_1
  doi: 10.1109/ACCESS.2021.3102399
– start-page: 2449
  volume-title: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: technical papers
  year: 2016
  ident: e_1_3_4_52_1
– ident: e_1_3_4_23_1
  doi: 10.1007/11538059_91
– ident: e_1_3_4_51_1
  doi: 10.1016/j.dss.2013.08.002
– ident: e_1_3_4_21_1
  doi: 10.1016/j.ipm.2020.102262
– ident: e_1_3_4_17_1
  doi: 10.1007/978-3-319-98074-4
– ident: e_1_3_4_14_1
  doi: 10.1016/j.asoc.2021.107378
– ident: e_1_3_4_47_1
  doi: 10.1002/widm.1249
– start-page: 1322
  volume-title: Proceedings of 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence)
  year: 2008
  ident: e_1_3_4_25_1
– ident: e_1_3_4_10_1
  doi: 10.1613/jair.953
– start-page: 1837
  volume-title: Proceedings of the 27th International Conference on Computational Linguistics
  year: 2018
  ident: e_1_3_4_24_1
– start-page: 875
  volume-title: Data Mining and Knowledge Discovery Handbook
  year: 2010
  ident: e_1_3_4_9_1
– ident: e_1_3_4_19_1
  doi: 10.18653/v1/W16-0425
– ident: e_1_3_4_50_1
  doi: 10.4018/978-1-5225-4999-4.ch002
– ident: e_1_3_4_49_1
  doi: 10.1609/icwsm.v4i1.14018
– ident: e_1_3_4_15_1
  doi: 10.18653/v1/2022.semeval-1.111
– ident: e_1_3_4_5_1
  doi: 10.1007/s11042-020-09138-4
– ident: e_1_3_4_42_1
  doi: 10.1038/nbt1206-1565
– ident: e_1_3_4_29_1
  doi: 10.1145/3124420
– ident: e_1_3_4_2_1
  doi: 10.18653/v1/P16-3016
– ident: e_1_3_4_6_1
  doi: 10.1016/j.neunet.2018.07.011
– ident: e_1_3_4_37_1
  doi: 10.1007/s11042-018-6445-z
– ident: e_1_3_4_48_1
  doi: 10.1016/j.ins.2019.11.004
– ident: e_1_3_4_36_1
  doi: 10.1186/s40537-018-0151-6
– issue: 1607
  year: 2016
  ident: e_1_3_4_4_1
  article-title: Modelling context with user embeddings for sarcasm detection in Social media
  publication-title: arXiv preprint arXiv
– ident: e_1_3_4_16_1
  doi: 10.3390/app8050815
– ident: e_1_3_4_41_1
– start-page: 213
  volume-title: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers
  year: 2014
  ident: e_1_3_4_44_1
– ident: e_1_3_4_46_1
  doi: 10.18653/v1/D13-1066
– ident: e_1_3_4_3_1
  doi: 10.3390/e23040394
– start-page: 301
  volume-title: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017)
  year: 2017
  ident: e_1_3_4_26_1
– ident: e_1_3_4_30_1
  doi: 10.1007/s12559-021-09821-0
– start-page: 4171
  volume-title: Proceedings of NAACL-HLT 2019
  year: 2019
  ident: e_1_3_4_31_1
– ident: e_1_3_4_32_1
– ident: e_1_3_4_43_1
  doi: 10.3115/v1/D14-1162
– ident: e_1_3_4_38_1
  doi: 10.1109/TCYB.2016.2579658
– ident: e_1_3_4_11_1
  doi: 10.1145/2939672.2939785
– ident: e_1_3_4_28_1
  doi: 10.1016/j.asoc.2020.106198
– start-page: 4238
  volume-title: Proceedings of the 9th International Conference on Language Resources and Evaluation
  year: 2014
  ident: e_1_3_4_40_1
– ident: e_1_3_4_39_1
  doi: 10.1007/978-3-319-08010-9_49
– start-page: 581
  volume-title: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies
  year: 2011
  ident: e_1_3_4_22_1
– ident: e_1_3_4_34_1
  doi: 10.1016/j.asoc.2019.105662
– ident: e_1_3_4_45_1
  doi: 10.1145/2684822.2685316
– ident: e_1_3_4_8_1
  doi: 10.1007/s12559-016-9415-7
– ident: e_1_3_4_20_1
  doi: 10.4324/9781410616685
– ident: e_1_3_4_7_1
  doi: 10.1007/978-3-642-01307-2_43
– ident: e_1_3_4_12_1
  doi: 10.1016/j.eswa.2020.114041
– ident: e_1_3_4_13_1
  doi: 10.1111/j.2517-6161.1958.tb00292.x
– ident: e_1_3_4_18_1
  doi: 10.1109/TSMCC.2011.2161285
– ident: e_1_3_4_27_1
  doi: 10.1109/ACCESS.2018.2817572
– ident: e_1_3_4_35_1
  doi: 10.1007/s13748-016-0094-0
SSID ssj0001771
Score 2.4127524
Snippet The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with...
SourceID doaj
crossref
SourceType Open Website
Index Database
Title Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
URI https://doaj.org/article/0bb162e14e68444091ac9a00ed044cf3
Volume 39
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxbeiPKSB9YUOzk7zgiUqiABA61UpshPCYkGRMP_55wHKhML6ymyrO_s3HfJ3XeEXOQuC0yBTnQwIgFMOhJVOJ8UzojMx_ZrERucHx7ldA73C7FYG_UVa8JaeeAWuEtmDJep5-ClAsBshGtbaMa8YwA2NDqfaOyTqe4dzPMm1cIrlCXIIaDv3Ymq2miLJswNUzFKQWLAgl9RaU28v4kykx2y1dFDetVua5ds-GqPbPejF2h3E_fJyzSqI2DYoc1US3q3NLFG0XpHx7rW9LWiz3iE9WpJx75uyq0qGr-50ttq5ZfmzdOnWJChY0E5rjLrpVxXB2Q-uZ3dTJNuSkJiMy7rxAdhdKqEFSq3XEslNUcSggBlIYv_YRnTNs1lcMI4L4wqLG7GiiDRHYCE5JAMqvfKHxEaAARgTINI5IRLVXAyz5yQWhTWsXRIRj1K5UcrhlHyXmO0g7WMsJYdrENyHbH8eThqWTcG9HDZebj8y8PH_7HICdmMG2slGk_JoP788mdIJ2pz3pycb5cqwrQ
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handling+Class+Imbalanced+Data+in+Sarcasm+Detection+with+Ensemble+Oversampling+Techniques&rft.jtitle=Applied+artificial+intelligence&rft.au=Ya-Han+Hu&rft.au=Ting-Hsuan+Liu&rft.au=Chih-Fong+Tsai&rft.au=Yu-Jung+Lin&rft.date=2025-12-31&rft.pub=Taylor+%26+Francis+Group&rft.issn=0883-9514&rft.eissn=1087-6545&rft.volume=39&rft.issue=1&rft_id=info:doi/10.1080%2F08839514.2025.2468534&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0883-9514&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0883-9514&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0883-9514&client=summon