A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors

Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this...

Full description

Saved in:

Bibliographic Details
Published in	Information sciences Vol. 565; pp. 438 - 455
Main Authors	Li, Junnan, Zhu, Qingsheng, Wu, Quanwang, Fan, Zhu
Format	Journal Article
Language	English
Published	Elsevier Inc 01.07.2021
Subjects	Class-imbalance learning Classification K nearest neighbors Natural neighbors Oversampling Supervised learning Class-imbalance learning Natural neighbors Oversampling Supervised learning Classification K nearest neighbors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. Among the existing methods for addressing this problem, SMOTE has been successful, has received great praise, and features an extensive range of practical applications. In this paper, we focus on SMOTE and its extensions, aiming to solve the most challenging issues, namely, the choice of the parameter k and the determination of the neighbor number of each sample. Hence, a synthetic minority oversampling technique with natural neighbors (NaNSMOTE) is proposed. In NaNSMOTE, the random difference between a selected base sample and one of its natural neighbors is used to generate synthetic samples. The main advantages of NaNSMOTE are that (a) it has an adaptive k value related to the data complexity; (b) samples of class centers have more neighbors to improve the generalization of synthetic samples, while border samples have fewer neighbors to reduce the error of synthetic samples; and (c) it can remove outliers. The effectiveness of NaNSMOTE is proven by comparing it with SMOTE and extended versions of SMOTE on real data sets.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2021.03.041