A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors
Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this...
Saved in:
Published in | Information sciences Vol. 565; pp. 438 - 455 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.07.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this paper, we focus on SMOTE and its extensions, aiming to solve the most challenging issues, namely, the choice of the parameter k and the determination of the neighbor number of each sample. Hence, a synthetic minority oversampling technique with natural neighbors (NaNSMOTE) is proposed. In NaNSMOTE, the random difference between a selected base sample and one of its natural neighbors is used to generate synthetic samples. The main advantages of NaNSMOTE are that (a) it has an adaptive k value related to the data complexity; (b) samples of class centers have more neighbors to improve the generalization of synthetic samples, while border samples have fewer neighbors to reduce the error of synthetic samples; and (c) it can remove outliers. The effectiveness of NaNSMOTE is proven by comparing it with SMOTE and extended versions of SMOTE on real data sets. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2021.03.041 |