Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage t...
Saved in:
Published in | Applied artificial intelligence Vol. 39; no. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis Group
31.12.2025
|
Online Access | Get full text |
Cover
Loading…
Summary: | The rise of social media has amplified online sharing, necessitating businesses to comprehend public sentiment. Traditional sentiment analysis struggles with sarcasm detection and class imbalance. To address this, we introduce Synthetic Ensemble Oversampling methods (SEO) that effectively leverage the strengths of various oversampling algorithms. By incorporating ensemble learning principles into oversampling techniques, our proposed methods offer distinct strategies for selecting newly generated sarcastic data. In this study, we employ five oversampling algorithms: Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), polynom-fit-SMOTE, Proximity Weighted Synthetic Sampling (ProWSyn), and SMOTE with Instance Prioritization and Filtering (SMOTE_IPF). We work with two imbalanced sarcasm detection datasets, iSarcasmEval and SARC-reduced, collected from Twitter and Reddit. After extracting features from using Word2Vec, Global Vectors (GloVe), and FastText, we apply oversampling and ensemble techniques. Evaluated across six classifiers – Support Vector Machine, Decision Tree, Random Forest, Extreme Gradient Boosting, Logistic Regression, and BERT – the results demonstrate that the SEO2 framework consistently enhances classifier performance compared to single oversampling techniques. Notably, the Cluster Uncentered method frequently provides the best improvements across datasets, achieving significant gains in both AUC and F1 scores. These findings highlight the potential of ensemble-based oversampling in addressing class imbalance for sarcasm detection. |
---|---|
ISSN: | 0883-9514 1087-6545 |
DOI: | 10.1080/08839514.2025.2468534 |