A three-way decision ensemble method for imbalanced data oversampling
Synthetic Minority Over-sampling Technique (SMOTE) is an effective method for imbalanced data classification. Many variants of SMOTE have been proposed in the past decade. These methods mainly focused on how to select the crucial minority samples which implicitly assume the selection of key minority...
Saved in:
Published in | International journal of approximate reasoning Vol. 107; pp. 1 - 16 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.04.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Synthetic Minority Over-sampling Technique (SMOTE) is an effective method for imbalanced data classification. Many variants of SMOTE have been proposed in the past decade. These methods mainly focused on how to select the crucial minority samples which implicitly assume the selection of key minority samples is binary. Thus, the cost of key sample selection is seldom considered. To this end, this paper proposes a three-way decision model (CTD) by considering the differences in the cost of selecting key samples. CTD first uses Constructive Covering Algorithm (CCA) to divide the minority samples into several covers. Then, a three-way decision model for key sample selection is constructed according to the density of the cover on minority samples. Finally, the corresponding threshold α and β of CTD are obtained based on the pattern of cover distribution on minority samples, after that key samples can be selected for SMOTE oversampling. Moreover, to overcome the shortage of CCA which may contain non-optimal by randomly selecting the cover center, an ensemble model based on CTD (CTDE) is further proposed to improve the performance of CTD. Numerical experiments on 10 imbalanced datasets show that our method is superior to the comparison methods. By constructing the ensemble of the three-way decision based key sample selection, performance of the model can be effectively improved compared with several state-of-the-art methods. |
---|---|
ISSN: | 0888-613X 1873-4731 |
DOI: | 10.1016/j.ijar.2018.12.011 |