A three-way decision ensemble method for imbalanced data oversampling

Synthetic Minority Over-sampling Technique (SMOTE) is an effective method for imbalanced data classification. Many variants of SMOTE have been proposed in the past decade. These methods mainly focused on how to select the crucial minority samples which implicitly assume the selection of key minority...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of approximate reasoning Vol. 107; pp. 1 - 16
Main Authors Yan, Yuan Ting, Wu, Zeng Bao, Du, Xiu Quan, Chen, Jie, Zhao, Shu, Zhang, Yan Ping
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.04.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Synthetic Minority Over-sampling Technique (SMOTE) is an effective method for imbalanced data classification. Many variants of SMOTE have been proposed in the past decade. These methods mainly focused on how to select the crucial minority samples which implicitly assume the selection of key minority samples is binary. Thus, the cost of key sample selection is seldom considered. To this end, this paper proposes a three-way decision model (CTD) by considering the differences in the cost of selecting key samples. CTD first uses Constructive Covering Algorithm (CCA) to divide the minority samples into several covers. Then, a three-way decision model for key sample selection is constructed according to the density of the cover on minority samples. Finally, the corresponding threshold α and β of CTD are obtained based on the pattern of cover distribution on minority samples, after that key samples can be selected for SMOTE oversampling. Moreover, to overcome the shortage of CCA which may contain non-optimal by randomly selecting the cover center, an ensemble model based on CTD (CTDE) is further proposed to improve the performance of CTD. Numerical experiments on 10 imbalanced datasets show that our method is superior to the comparison methods. By constructing the ensemble of the three-way decision based key sample selection, performance of the model can be effectively improved compared with several state-of-the-art methods.
ISSN:0888-613X
1873-4731
DOI:10.1016/j.ijar.2018.12.011