A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets

•Compare OUPS and Safe Level OUPS against popular SMOTE generalizations.•Safe Level OUPS resulted in the highest sensitivity and g-mean.•OUPS modification did perform moderately well within neural networks.•Safe Level OUPS improves prediction of noisy minority members using Linear SVM. Building accu...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 66; pp. 124 - 135
Main Authors	Rivera, William A., Xanthopoulos, Petros
Format	Journal Article
Language	English
Published	Elsevier Ltd 30.12.2016
Subjects	Class imbalance Classification OUPS SMOTE SMOTE OUPS Class imbalance Classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Compare OUPS and Safe Level OUPS against popular SMOTE generalizations.•Safe Level OUPS resulted in the highest sensitivity and g-mean.•OUPS modification did perform moderately well within neural networks.•Safe Level OUPS improves prediction of noisy minority members using Linear SVM. Building accurate classifiers for predicting group membership is made difficult when using data that is skewed or imbalanced which is typical of real world data sets. The classifier has a tendency to be biased towards the over represented or majority group as a result. Re-sampling techniques offer simple approaches that can be used to minimize the effect. Over-sampling methods aim to combat class imbalance by increasing the number of minority group samples also refereed to as members of the minority group. Over the last decade SMOTE based methods have been used and extended to overcome this problem. There has been little emphasis on improvements to this approach with consideration to data intrinsic properties beyond that of class imbalance alone. In this paper we introduce modifications to a priori based methods Safe Level OUPS and OUPS that result in improvement for sensitivity measures over competing approaches using the SMOTE based method such as the Local neighborhood extension to SMOTE (LN-SMOTE), Borderline-SMOTE and Safe-Level-SMOTE.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2016.09.010