Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques

The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage t...

Full description

Saved in:

Bibliographic Details
Published in	Applied artificial intelligence Vol. 39; no. 1
Main Authors	Hu, Ya-Han, Liu, Ting-Hsuan, Tsai, Chih-Fong, Lin, Yu-Jung
Format	Journal Article
Language	English
Published	Taylor & Francis Group 31.12.2025
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection.
ISSN:	0883-9514 1087-6545
DOI:	10.1080/08839514.2025.2468534