An oversampling method based on adaptive artificial immune network and SMOTE

The problem of data imbalance often causes classification algorithms to overlook the minority classes, which are usually more valuable in practical applications. Consequently, this impacts the classification performance. Sampling strategies balance class distribution at the data level, making them a...

Full description

Saved in:
Bibliographic Details
Published inGenetic programming and evolvable machines Vol. 26; no. 2
Main Authors Bai, Lin, Sun, Mengchen, Jiang, Xianlin, Liu, Jingxuan, Liu, Jialu, Pan, Xiaoying
Format Journal Article
LanguageEnglish
Published New York Springer US 01.12.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The problem of data imbalance often causes classification algorithms to overlook the minority classes, which are usually more valuable in practical applications. Consequently, this impacts the classification performance. Sampling strategies balance class distribution at the data level, making them an effective means of improving classifier performance. However, most existing methods focus on achieving a balance in sample quantity between classes while neglecting the impact of the original data’s spatial distribution in the feature space on classification outcomes. A new oversampling algorithm based on Adaptive Artificial Immune Network and SMOTE (ADAIN-SMOTE) is proposed in this paper. ADAIN incorporates a global mutation operator and an adaptive network suppression operator into Artificial Immune Network, enhancing the evolutionary network’s learning capability for the original data and achieving adaptive network compression. Ultimately, it evolves a network structure capable of mapping the distribution of the original data, which is then used to augment the minority class. SMOTE is subsequently applied to oversample the new minority class. The resulting synthetic minority samples avoid excessive sampling of noisy or irrelevant data and better preserve the true distribution of the original data. This method demonstrates general applicability across various classification models and significantly enhances classification performance. Comparative experiments on 26 datasets, 5 classifiers, and 8 oversampling algorithms show that the proposed algorithm ranks first in terms of average F1, G-mean, PR_AUC, and MCC metrics.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1389-2576
1573-7632
DOI:10.1007/s10710-025-09516-7