A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 565; pp. 438 - 455
Main Authors Li, Junnan, Zhu, Qingsheng, Wu, Quanwang, Fan, Zhu
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.07.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this paper, we focus on SMOTE and its extensions, aiming to solve the most challenging issues, namely, the choice of the parameter k and the determination of the neighbor number of each sample. Hence, a synthetic minority oversampling technique with natural neighbors (NaNSMOTE) is proposed. In NaNSMOTE, the random difference between a selected base sample and one of its natural neighbors is used to generate synthetic samples. The main advantages of NaNSMOTE are that (a) it has an adaptive k value related to the data complexity; (b) samples of class centers have more neighbors to improve the generalization of synthetic samples, while border samples have fewer neighbors to reduce the error of synthetic samples; and (c) it can remove outliers. The effectiveness of NaNSMOTE is proven by comparing it with SMOTE and extended versions of SMOTE on real data sets.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2021.03.041