Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques

The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage t...

Full description

Saved in:

Bibliographic Details
Published in	Applied artificial intelligence Vol. 39; no. 1
Main Authors	Hu, Ya-Han, Liu, Ting-Hsuan, Tsai, Chih-Fong, Lin, Yu-Jung
Format	Journal Article
Language	English
Published	Taylor & Francis Group 31.12.2025
Online Access	Get full text

Cover

Loading…

Abstract	The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection.
AbstractList	The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection.
Author	Lin, Yu-Jung Hu, Ya-Han Liu, Ting-Hsuan Tsai, Chih-Fong
Author_xml	– sequence: 1 givenname: Ya-Han surname: Hu fullname: Hu, Ya-Han – sequence: 2 givenname: Ting-Hsuan orcidid: 0009-0001-3790-0980 surname: Liu fullname: Liu, Ting-Hsuan – sequence: 3 givenname: Chih-Fong surname: Tsai fullname: Tsai, Chih-Fong – sequence: 4 givenname: Yu-Jung surname: Lin fullname: Lin, Yu-Jung
BookMark	eNo9kN1OwkAQRjcGExF9BJN9geJu96fbSwMoJCRciBdebabbKZS0W-xWjW9vC-jVJN98OTM5t2TkG4-EPHA25cywR2aMSBWX05jFahpLbZSQV2TcL5NIK6lGZDx0oqF0Q25DODDGeJLwMXlfgs-r0u_orIIQ6KrOoALvMKdz6ICWnr5C6yDUdI4duq5sPP0uuz1d-IB1ViHdfGEboD6eKFt0e19-fGK4I9cFVAHvL3NC3p4X29kyWm9eVrOndeQE112EhcogNsopkzgO2mjgJhUMc1GIRMaCMXBxootcZTmqzKSuf86pQueZklqKCVmduXkDB3tsyxraH9tAaU9B0-4stF3pKrQsy7iOkUvURkrJUg4uBdbfYlK6QvQsdWa5tgmhxeKfx5kdXNs_13ZwbS-uxS_3KHPh
Cites_doi	10.1109/ACCESS.2021.3102399 10.1007/11538059_91 10.1016/j.dss.2013.08.002 10.1016/j.ipm.2020.102262 10.1007/978-3-319-98074-4 10.1016/j.asoc.2021.107378 10.1002/widm.1249 10.1613/jair.953 10.18653/v1/W16-0425 10.4018/978-1-5225-4999-4.ch002 10.1609/icwsm.v4i1.14018 10.18653/v1/2022.semeval-1.111 10.1007/s11042-020-09138-4 10.1038/nbt1206-1565 10.1145/3124420 10.18653/v1/P16-3016 10.1016/j.neunet.2018.07.011 10.1007/s11042-018-6445-z 10.1016/j.ins.2019.11.004 10.1186/s40537-018-0151-6 10.3390/app8050815 10.18653/v1/D13-1066 10.3390/e23040394 10.1007/s12559-021-09821-0 10.3115/v1/D14-1162 10.1109/TCYB.2016.2579658 10.1145/2939672.2939785 10.1016/j.asoc.2020.106198 10.1007/978-3-319-08010-9_49 10.1016/j.asoc.2019.105662 10.1145/2684822.2685316 10.1007/s12559-016-9415-7 10.4324/9781410616685 10.1007/978-3-642-01307-2_43 10.1016/j.eswa.2020.114041 10.1111/j.2517-6161.1958.tb00292.x 10.1109/TSMCC.2011.2161285 10.1109/ACCESS.2018.2817572 10.1007/s13748-016-0094-0
ContentType	Journal Article
DBID	AAYXX CITATION DOA
DOI	10.1080/08839514.2025.2468534
DatabaseName	CrossRef DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1087-6545
ExternalDocumentID	oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3 10_1080_08839514_2025_2468534
GroupedDBID	.4S .7F .DC .QJ 0YH 23M 2DF 30N 4.4 5GY 5VS 8VB AAENE AAFWJ AAJMT AAYXX ABCCY ABDBF ABFIM ABHAV ABIVO ABPEM ABTAI ACGEJ ACGFS ACGOD ACNCT ACTIO ACUHS ADCVX ADMLS ADXPE AEISY AENEX AEOZL AEPSL AEYOC AFKVX AGMYJ AHQJS AIJEM AIYEW AJWEG AKVCP ALMA_UNASSIGNED_HOLDINGS ALQZU AQRUH ARCSS AVBZW BLEHA CCCUG CE4 CITATION CS3 DKSSO EAP EBR EBS EBU ECS EDO EMK EPL EST ESX E~A E~B F5P GROUPED_DOAJ GTTXZ H13 HF~ HZ~ H~9 H~P I-F J.P KYCEM LJTGL M4Z MK~ NA5 O9- P2P PQQKQ QWB RIG S-T SNACF TDBHL TFL TFW TH9 TNC TTHFI TUS TWF UT5 UU3 ZL0 ~S~
ID	FETCH-LOGICAL-c316t-ef5ba285c587c1a686a18930ed3f3742300ac276fd5bde5b89ccedc5f6db54643
IEDL.DBID	DOA
ISSN	0883-9514
IngestDate	Wed Aug 27 01:31:09 EDT 2025 Sun Jul 06 05:04:22 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c316t-ef5ba285c587c1a686a18930ed3f3742300ac276fd5bde5b89ccedc5f6db54643
ORCID	0009-0001-3790-0980
OpenAccessLink	https://doaj.org/article/0bb162e14e68444091ac9a00ed044cf3
ParticipantIDs	doaj_primary_oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3 crossref_primary_10_1080_08839514_2025_2468534
PublicationCentury	2000
PublicationDate	2025-12-31
PublicationDateYYYYMMDD	2025-12-31
PublicationDate_xml	– month: 12 year: 2025 text: 2025-12-31 day: 31
PublicationDecade	2020
PublicationTitle	Applied artificial intelligence
PublicationYear	2025
Publisher	Taylor & Francis Group
Publisher_xml	– name: Taylor & Francis Group
References	e_1_3_4_3_1 He H. (e_1_3_4_25_1) 2008 e_1_3_4_42_1 e_1_3_4_7_1 e_1_3_4_5_1 e_1_3_4_23_1 e_1_3_4_46_1 e_1_3_4_21_1 e_1_3_4_27_1 e_1_3_4_48_1 e_1_3_4_29_1 Zhang M. (e_1_3_4_52_1) 2016 Kenton J. D. M. W. C. (e_1_3_4_31_1) 2019 Maynard D. G. (e_1_3_4_40_1) 2014 e_1_3_4_30_1 e_1_3_4_51_1 e_1_3_4_13_1 e_1_3_4_34_1 e_1_3_4_11_1 e_1_3_4_32_1 e_1_3_4_17_1 e_1_3_4_38_1 e_1_3_4_15_1 e_1_3_4_36_1 e_1_3_4_19_1 e_1_3_4_2_1 e_1_3_4_8_1 e_1_3_4_20_1 e_1_3_4_41_1 e_1_3_4_6_1 e_1_3_4_45_1 e_1_3_4_43_1 Hazarika D. (e_1_3_4_24_1) 2018 e_1_3_4_28_1 Ptáček T. (e_1_3_4_44_1) 2014 e_1_3_4_49_1 e_1_3_4_47_1 Amir S. (e_1_3_4_4_1) 2016 Chawla N. V. (e_1_3_4_9_1) 2010 González-Ibánez R. (e_1_3_4_22_1) 2011 e_1_3_4_50_1 e_1_3_4_12_1 e_1_3_4_35_1 e_1_3_4_10_1 e_1_3_4_33_1 e_1_3_4_16_1 e_1_3_4_39_1 e_1_3_4_14_1 e_1_3_4_37_1 e_1_3_4_18_1 Hercig T. (e_1_3_4_26_1) 2017
References_xml	– ident: e_1_3_4_33_1 doi: 10.1109/ACCESS.2021.3102399 – start-page: 2449 volume-title: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: technical papers year: 2016 ident: e_1_3_4_52_1 – ident: e_1_3_4_23_1 doi: 10.1007/11538059_91 – ident: e_1_3_4_51_1 doi: 10.1016/j.dss.2013.08.002 – ident: e_1_3_4_21_1 doi: 10.1016/j.ipm.2020.102262 – ident: e_1_3_4_17_1 doi: 10.1007/978-3-319-98074-4 – ident: e_1_3_4_14_1 doi: 10.1016/j.asoc.2021.107378 – ident: e_1_3_4_47_1 doi: 10.1002/widm.1249 – start-page: 1322 volume-title: Proceedings of 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) year: 2008 ident: e_1_3_4_25_1 – ident: e_1_3_4_10_1 doi: 10.1613/jair.953 – start-page: 1837 volume-title: Proceedings of the 27th International Conference on Computational Linguistics year: 2018 ident: e_1_3_4_24_1 – start-page: 875 volume-title: Data Mining and Knowledge Discovery Handbook year: 2010 ident: e_1_3_4_9_1 – ident: e_1_3_4_19_1 doi: 10.18653/v1/W16-0425 – ident: e_1_3_4_50_1 doi: 10.4018/978-1-5225-4999-4.ch002 – ident: e_1_3_4_49_1 doi: 10.1609/icwsm.v4i1.14018 – ident: e_1_3_4_15_1 doi: 10.18653/v1/2022.semeval-1.111 – ident: e_1_3_4_5_1 doi: 10.1007/s11042-020-09138-4 – ident: e_1_3_4_42_1 doi: 10.1038/nbt1206-1565 – ident: e_1_3_4_29_1 doi: 10.1145/3124420 – ident: e_1_3_4_2_1 doi: 10.18653/v1/P16-3016 – ident: e_1_3_4_6_1 doi: 10.1016/j.neunet.2018.07.011 – ident: e_1_3_4_37_1 doi: 10.1007/s11042-018-6445-z – ident: e_1_3_4_48_1 doi: 10.1016/j.ins.2019.11.004 – ident: e_1_3_4_36_1 doi: 10.1186/s40537-018-0151-6 – issue: 1607 year: 2016 ident: e_1_3_4_4_1 article-title: Modelling context with user embeddings for sarcasm detection in Social media publication-title: arXiv preprint arXiv – ident: e_1_3_4_16_1 doi: 10.3390/app8050815 – ident: e_1_3_4_41_1 – start-page: 213 volume-title: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers year: 2014 ident: e_1_3_4_44_1 – ident: e_1_3_4_46_1 doi: 10.18653/v1/D13-1066 – ident: e_1_3_4_3_1 doi: 10.3390/e23040394 – start-page: 301 volume-title: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017) year: 2017 ident: e_1_3_4_26_1 – ident: e_1_3_4_30_1 doi: 10.1007/s12559-021-09821-0 – start-page: 4171 volume-title: Proceedings of NAACL-HLT 2019 year: 2019 ident: e_1_3_4_31_1 – ident: e_1_3_4_32_1 – ident: e_1_3_4_43_1 doi: 10.3115/v1/D14-1162 – ident: e_1_3_4_38_1 doi: 10.1109/TCYB.2016.2579658 – ident: e_1_3_4_11_1 doi: 10.1145/2939672.2939785 – ident: e_1_3_4_28_1 doi: 10.1016/j.asoc.2020.106198 – start-page: 4238 volume-title: Proceedings of the 9th International Conference on Language Resources and Evaluation year: 2014 ident: e_1_3_4_40_1 – ident: e_1_3_4_39_1 doi: 10.1007/978-3-319-08010-9_49 – start-page: 581 volume-title: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies year: 2011 ident: e_1_3_4_22_1 – ident: e_1_3_4_34_1 doi: 10.1016/j.asoc.2019.105662 – ident: e_1_3_4_45_1 doi: 10.1145/2684822.2685316 – ident: e_1_3_4_8_1 doi: 10.1007/s12559-016-9415-7 – ident: e_1_3_4_20_1 doi: 10.4324/9781410616685 – ident: e_1_3_4_7_1 doi: 10.1007/978-3-642-01307-2_43 – ident: e_1_3_4_12_1 doi: 10.1016/j.eswa.2020.114041 – ident: e_1_3_4_13_1 doi: 10.1111/j.2517-6161.1958.tb00292.x – ident: e_1_3_4_18_1 doi: 10.1109/TSMCC.2011.2161285 – ident: e_1_3_4_27_1 doi: 10.1109/ACCESS.2018.2817572 – ident: e_1_3_4_35_1 doi: 10.1007/s13748-016-0094-0
SSID	ssj0001771
Score	2.4127524
Snippet	The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with...
SourceID	doaj crossref
SourceType	Open Website Index Database
Title	Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
URI	https://doaj.org/article/0bb162e14e68444091ac9a00ed044cf3
Volume	39
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxbeiPKSB9YUOzk7zgiUqiABA61UpshPCYkGRMP_55wHKhML6ymyrO_s3HfJ3XeEXOQuC0yBTnQwIgFMOhJVOJ8UzojMx_ZrERucHx7ldA73C7FYG_UVa8JaeeAWuEtmDJep5-ClAsBshGtbaMa8YwA2NDqfaOyTqe4dzPMm1cIrlCXIIaDv3Ymq2miLJswNUzFKQWLAgl9RaU28v4kykx2y1dFDetVua5ds-GqPbPejF2h3E_fJyzSqI2DYoc1US3q3NLFG0XpHx7rW9LWiz3iE9WpJx75uyq0qGr-50ttq5ZfmzdOnWJChY0E5rjLrpVxXB2Q-uZ3dTJNuSkJiMy7rxAdhdKqEFSq3XEslNUcSggBlIYv_YRnTNs1lcMI4L4wqLG7GiiDRHYCE5JAMqvfKHxEaAARgTINI5IRLVXAyz5yQWhTWsXRIRj1K5UcrhlHyXmO0g7WMsJYdrENyHbH8eThqWTcG9HDZebj8y8PH_7HICdmMG2slGk_JoP788mdIJ2pz3pycb5cqwrQ
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handling+Class+Imbalanced+Data+in+Sarcasm+Detection+with+Ensemble+Oversampling+Techniques&rft.jtitle=Applied+artificial+intelligence&rft.au=Ya-Han+Hu&rft.au=Ting-Hsuan+Liu&rft.au=Chih-Fong+Tsai&rft.au=Yu-Jung+Lin&rft.date=2025-12-31&rft.pub=Taylor+%26+Francis+Group&rft.issn=0883-9514&rft.eissn=1087-6545&rft.volume=39&rft.issue=1&rft_id=info:doi/10.1080%2F08839514.2025.2468534&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0883-9514&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0883-9514&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0883-9514&client=summon