Real-value negative selection over-sampling for imbalanced data set learning
•As an over-sampling method, RNSO does not require minority class instance available.•The generation of artificial minority class instances only relies on majority class.•RNSO can effectively avoid the generation of noisy instances and duplicated instances.•RNS can solve imbalanced classification ta...
Saved in:
Published in | Expert systems with applications Vol. 129; pp. 118 - 134 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Elsevier Ltd
01.09.2019
Elsevier BV |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •As an over-sampling method, RNSO does not require minority class instance available.•The generation of artificial minority class instances only relies on majority class.•RNSO can effectively avoid the generation of noisy instances and duplicated instances.•RNS can solve imbalanced classification task without any modification of classifier.•RNSO-based approach obtains better imbalanced classification results than other ones.
The learning problem from imbalanced data set poses a major challenge in data mining community. Conventional machine learning algorithms show poor performance in dealing with the classification problems of imbalanced data set since they are originally designed to work with balanced class distribution. In this paper, we propose a new over-sampling technique, which uses the real-value negative selection (RNS) procedure to generate artificial minority data with no requirement of actual minority data available. The generated minority data with rare actual minority data if available are combined with the majority data as input to a bi-class classification approach for learning. In the experiments, we demonstrate the effectiveness of RNS in avoiding the problems often encountered by the existing over-sampling methods such as the generation of noisy instances and almost duplicated instances in the same clusters. Moreover, the extensive experimental results on the different imbalanced datasets from UCI repository and real-world imbalanced datasets show that when dealing with the classification of imbalanced datasets, the proposed hybrid approach can achieve better performance in terms of both G-Mean and F-Measure evaluation metrics as compared to the other existing imbalanced dataset classification techniques. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2019.04.011 |