Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage t...
Saved in:
Published in | Applied artificial intelligence Vol. 39; no. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis Group
31.12.2025
|
Online Access | Get full text |
Cover
Loading…
Abstract | The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection. |
---|---|
AbstractList | The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection. |
Author | Lin, Yu-Jung Hu, Ya-Han Liu, Ting-Hsuan Tsai, Chih-Fong |
Author_xml | – sequence: 1 givenname: Ya-Han surname: Hu fullname: Hu, Ya-Han – sequence: 2 givenname: Ting-Hsuan orcidid: 0009-0001-3790-0980 surname: Liu fullname: Liu, Ting-Hsuan – sequence: 3 givenname: Chih-Fong surname: Tsai fullname: Tsai, Chih-Fong – sequence: 4 givenname: Yu-Jung surname: Lin fullname: Lin, Yu-Jung |
BookMark | eNo9kN1OwkAQRjcGExF9BJN9geJu96fbSwMoJCRciBdebabbKZS0W-xWjW9vC-jVJN98OTM5t2TkG4-EPHA25cywR2aMSBWX05jFahpLbZSQV2TcL5NIK6lGZDx0oqF0Q25DODDGeJLwMXlfgs-r0u_orIIQ6KrOoALvMKdz6ICWnr5C6yDUdI4duq5sPP0uuz1d-IB1ViHdfGEboD6eKFt0e19-fGK4I9cFVAHvL3NC3p4X29kyWm9eVrOndeQE112EhcogNsopkzgO2mjgJhUMc1GIRMaCMXBxootcZTmqzKSuf86pQueZklqKCVmduXkDB3tsyxraH9tAaU9B0-4stF3pKrQsy7iOkUvURkrJUg4uBdbfYlK6QvQsdWa5tgmhxeKfx5kdXNs_13ZwbS-uxS_3KHPh |
Cites_doi | 10.1109/ACCESS.2021.3102399 10.1007/11538059_91 10.1016/j.dss.2013.08.002 10.1016/j.ipm.2020.102262 10.1007/978-3-319-98074-4 10.1016/j.asoc.2021.107378 10.1002/widm.1249 10.1613/jair.953 10.18653/v1/W16-0425 10.4018/978-1-5225-4999-4.ch002 10.1609/icwsm.v4i1.14018 10.18653/v1/2022.semeval-1.111 10.1007/s11042-020-09138-4 10.1038/nbt1206-1565 10.1145/3124420 10.18653/v1/P16-3016 10.1016/j.neunet.2018.07.011 10.1007/s11042-018-6445-z 10.1016/j.ins.2019.11.004 10.1186/s40537-018-0151-6 10.3390/app8050815 10.18653/v1/D13-1066 10.3390/e23040394 10.1007/s12559-021-09821-0 10.3115/v1/D14-1162 10.1109/TCYB.2016.2579658 10.1145/2939672.2939785 10.1016/j.asoc.2020.106198 10.1007/978-3-319-08010-9_49 10.1016/j.asoc.2019.105662 10.1145/2684822.2685316 10.1007/s12559-016-9415-7 10.4324/9781410616685 10.1007/978-3-642-01307-2_43 10.1016/j.eswa.2020.114041 10.1111/j.2517-6161.1958.tb00292.x 10.1109/TSMCC.2011.2161285 10.1109/ACCESS.2018.2817572 10.1007/s13748-016-0094-0 |
ContentType | Journal Article |
DBID | AAYXX CITATION DOA |
DOI | 10.1080/08839514.2025.2468534 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1087-6545 |
ExternalDocumentID | oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3 10_1080_08839514_2025_2468534 |
GroupedDBID | .4S .7F .DC .QJ 0YH 23M 2DF 30N 4.4 5GY 5VS 8VB AAENE AAFWJ AAJMT AAYXX ABCCY ABDBF ABFIM ABHAV ABIVO ABPEM ABTAI ACGEJ ACGFS ACGOD ACNCT ACTIO ACUHS ADCVX ADMLS ADXPE AEISY AENEX AEOZL AEPSL AEYOC AFKVX AGMYJ AHQJS AIJEM AIYEW AJWEG AKVCP ALMA_UNASSIGNED_HOLDINGS ALQZU AQRUH ARCSS AVBZW BLEHA CCCUG CE4 CITATION CS3 DKSSO EAP EBR EBS EBU ECS EDO EMK EPL EST ESX E~A E~B F5P GROUPED_DOAJ GTTXZ H13 HF~ HZ~ H~9 H~P I-F J.P KYCEM LJTGL M4Z MK~ NA5 O9- P2P PQQKQ QWB RIG S-T SNACF TDBHL TFL TFW TH9 TNC TTHFI TUS TWF UT5 UU3 ZL0 ~S~ |
ID | FETCH-LOGICAL-c316t-ef5ba285c587c1a686a18930ed3f3742300ac276fd5bde5b89ccedc5f6db54643 |
IEDL.DBID | DOA |
ISSN | 0883-9514 |
IngestDate | Wed Aug 27 01:31:09 EDT 2025 Sun Jul 06 05:04:22 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c316t-ef5ba285c587c1a686a18930ed3f3742300ac276fd5bde5b89ccedc5f6db54643 |
ORCID | 0009-0001-3790-0980 |
OpenAccessLink | https://doaj.org/article/0bb162e14e68444091ac9a00ed044cf3 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3 crossref_primary_10_1080_08839514_2025_2468534 |
PublicationCentury | 2000 |
PublicationDate | 2025-12-31 |
PublicationDateYYYYMMDD | 2025-12-31 |
PublicationDate_xml | – month: 12 year: 2025 text: 2025-12-31 day: 31 |
PublicationDecade | 2020 |
PublicationTitle | Applied artificial intelligence |
PublicationYear | 2025 |
Publisher | Taylor & Francis Group |
Publisher_xml | – name: Taylor & Francis Group |
References | e_1_3_4_3_1 He H. (e_1_3_4_25_1) 2008 e_1_3_4_42_1 e_1_3_4_7_1 e_1_3_4_5_1 e_1_3_4_23_1 e_1_3_4_46_1 e_1_3_4_21_1 e_1_3_4_27_1 e_1_3_4_48_1 e_1_3_4_29_1 Zhang M. (e_1_3_4_52_1) 2016 Kenton J. D. M. W. C. (e_1_3_4_31_1) 2019 Maynard D. G. (e_1_3_4_40_1) 2014 e_1_3_4_30_1 e_1_3_4_51_1 e_1_3_4_13_1 e_1_3_4_34_1 e_1_3_4_11_1 e_1_3_4_32_1 e_1_3_4_17_1 e_1_3_4_38_1 e_1_3_4_15_1 e_1_3_4_36_1 e_1_3_4_19_1 e_1_3_4_2_1 e_1_3_4_8_1 e_1_3_4_20_1 e_1_3_4_41_1 e_1_3_4_6_1 e_1_3_4_45_1 e_1_3_4_43_1 Hazarika D. (e_1_3_4_24_1) 2018 e_1_3_4_28_1 Ptáček T. (e_1_3_4_44_1) 2014 e_1_3_4_49_1 e_1_3_4_47_1 Amir S. (e_1_3_4_4_1) 2016 Chawla N. V. (e_1_3_4_9_1) 2010 González-Ibánez R. (e_1_3_4_22_1) 2011 e_1_3_4_50_1 e_1_3_4_12_1 e_1_3_4_35_1 e_1_3_4_10_1 e_1_3_4_33_1 e_1_3_4_16_1 e_1_3_4_39_1 e_1_3_4_14_1 e_1_3_4_37_1 e_1_3_4_18_1 Hercig T. (e_1_3_4_26_1) 2017 |
References_xml | – ident: e_1_3_4_33_1 doi: 10.1109/ACCESS.2021.3102399 – start-page: 2449 volume-title: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: technical papers year: 2016 ident: e_1_3_4_52_1 – ident: e_1_3_4_23_1 doi: 10.1007/11538059_91 – ident: e_1_3_4_51_1 doi: 10.1016/j.dss.2013.08.002 – ident: e_1_3_4_21_1 doi: 10.1016/j.ipm.2020.102262 – ident: e_1_3_4_17_1 doi: 10.1007/978-3-319-98074-4 – ident: e_1_3_4_14_1 doi: 10.1016/j.asoc.2021.107378 – ident: e_1_3_4_47_1 doi: 10.1002/widm.1249 – start-page: 1322 volume-title: Proceedings of 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) year: 2008 ident: e_1_3_4_25_1 – ident: e_1_3_4_10_1 doi: 10.1613/jair.953 – start-page: 1837 volume-title: Proceedings of the 27th International Conference on Computational Linguistics year: 2018 ident: e_1_3_4_24_1 – start-page: 875 volume-title: Data Mining and Knowledge Discovery Handbook year: 2010 ident: e_1_3_4_9_1 – ident: e_1_3_4_19_1 doi: 10.18653/v1/W16-0425 – ident: e_1_3_4_50_1 doi: 10.4018/978-1-5225-4999-4.ch002 – ident: e_1_3_4_49_1 doi: 10.1609/icwsm.v4i1.14018 – ident: e_1_3_4_15_1 doi: 10.18653/v1/2022.semeval-1.111 – ident: e_1_3_4_5_1 doi: 10.1007/s11042-020-09138-4 – ident: e_1_3_4_42_1 doi: 10.1038/nbt1206-1565 – ident: e_1_3_4_29_1 doi: 10.1145/3124420 – ident: e_1_3_4_2_1 doi: 10.18653/v1/P16-3016 – ident: e_1_3_4_6_1 doi: 10.1016/j.neunet.2018.07.011 – ident: e_1_3_4_37_1 doi: 10.1007/s11042-018-6445-z – ident: e_1_3_4_48_1 doi: 10.1016/j.ins.2019.11.004 – ident: e_1_3_4_36_1 doi: 10.1186/s40537-018-0151-6 – issue: 1607 year: 2016 ident: e_1_3_4_4_1 article-title: Modelling context with user embeddings for sarcasm detection in Social media publication-title: arXiv preprint arXiv – ident: e_1_3_4_16_1 doi: 10.3390/app8050815 – ident: e_1_3_4_41_1 – start-page: 213 volume-title: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers year: 2014 ident: e_1_3_4_44_1 – ident: e_1_3_4_46_1 doi: 10.18653/v1/D13-1066 – ident: e_1_3_4_3_1 doi: 10.3390/e23040394 – start-page: 301 volume-title: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017) year: 2017 ident: e_1_3_4_26_1 – ident: e_1_3_4_30_1 doi: 10.1007/s12559-021-09821-0 – start-page: 4171 volume-title: Proceedings of NAACL-HLT 2019 year: 2019 ident: e_1_3_4_31_1 – ident: e_1_3_4_32_1 – ident: e_1_3_4_43_1 doi: 10.3115/v1/D14-1162 – ident: e_1_3_4_38_1 doi: 10.1109/TCYB.2016.2579658 – ident: e_1_3_4_11_1 doi: 10.1145/2939672.2939785 – ident: e_1_3_4_28_1 doi: 10.1016/j.asoc.2020.106198 – start-page: 4238 volume-title: Proceedings of the 9th International Conference on Language Resources and Evaluation year: 2014 ident: e_1_3_4_40_1 – ident: e_1_3_4_39_1 doi: 10.1007/978-3-319-08010-9_49 – start-page: 581 volume-title: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies year: 2011 ident: e_1_3_4_22_1 – ident: e_1_3_4_34_1 doi: 10.1016/j.asoc.2019.105662 – ident: e_1_3_4_45_1 doi: 10.1145/2684822.2685316 – ident: e_1_3_4_8_1 doi: 10.1007/s12559-016-9415-7 – ident: e_1_3_4_20_1 doi: 10.4324/9781410616685 – ident: e_1_3_4_7_1 doi: 10.1007/978-3-642-01307-2_43 – ident: e_1_3_4_12_1 doi: 10.1016/j.eswa.2020.114041 – ident: e_1_3_4_13_1 doi: 10.1111/j.2517-6161.1958.tb00292.x – ident: e_1_3_4_18_1 doi: 10.1109/TSMCC.2011.2161285 – ident: e_1_3_4_27_1 doi: 10.1109/ACCESS.2018.2817572 – ident: e_1_3_4_35_1 doi: 10.1007/s13748-016-0094-0 |
SSID | ssj0001771 |
Score | 2.4127524 |
Snippet | The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with... |
SourceID | doaj crossref |
SourceType | Open Website Index Database |
Title | Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques |
URI | https://doaj.org/article/0bb162e14e68444091ac9a00ed044cf3 |
Volume | 39 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxbeiPKSB9YUOzk7zgiUqiABA61UpshPCYkGRMP_55wHKhML6ymyrO_s3HfJ3XeEXOQuC0yBTnQwIgFMOhJVOJ8UzojMx_ZrERucHx7ldA73C7FYG_UVa8JaeeAWuEtmDJep5-ClAsBshGtbaMa8YwA2NDqfaOyTqe4dzPMm1cIrlCXIIaDv3Ymq2miLJswNUzFKQWLAgl9RaU28v4kykx2y1dFDetVua5ds-GqPbPejF2h3E_fJyzSqI2DYoc1US3q3NLFG0XpHx7rW9LWiz3iE9WpJx75uyq0qGr-50ttq5ZfmzdOnWJChY0E5rjLrpVxXB2Q-uZ3dTJNuSkJiMy7rxAdhdKqEFSq3XEslNUcSggBlIYv_YRnTNs1lcMI4L4wqLG7GiiDRHYCE5JAMqvfKHxEaAARgTINI5IRLVXAyz5yQWhTWsXRIRj1K5UcrhlHyXmO0g7WMsJYdrENyHbH8eThqWTcG9HDZebj8y8PH_7HICdmMG2slGk_JoP788mdIJ2pz3pycb5cqwrQ |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Handling+Class+Imbalanced+Data+in+Sarcasm+Detection+with+Ensemble+Oversampling+Techniques&rft.jtitle=Applied+artificial+intelligence&rft.au=Ya-Han+Hu&rft.au=Ting-Hsuan+Liu&rft.au=Chih-Fong+Tsai&rft.au=Yu-Jung+Lin&rft.date=2025-12-31&rft.pub=Taylor+%26+Francis+Group&rft.issn=0883-9514&rft.eissn=1087-6545&rft.volume=39&rft.issue=1&rft_id=info:doi/10.1080%2F08839514.2025.2468534&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_0bb162e14e68444091ac9a00ed044cf3 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0883-9514&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0883-9514&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0883-9514&client=summon |