An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets
Learning and mining from imbalanced datasets gained increased interest in recent years. One simple but efficient way to increase the performance of standard machine learning techniques on imbalanced datasets is the synthetic generation of minority samples. In this paper, a detailed, empirical compar...
Saved in:
Published in | Applied soft computing Vol. 83; p. 105662 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.10.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Learning and mining from imbalanced datasets gained increased interest in recent years. One simple but efficient way to increase the performance of standard machine learning techniques on imbalanced datasets is the synthetic generation of minority samples. In this paper, a detailed, empirical comparison of 85 variants of minority oversampling techniques is presented and discussed involving 104 imbalanced datasets for evaluation. The goal of the work is to set a new baseline in the field, determine the oversampling principles leading to the best results under general circumstances, and also give guidance to practitioners on which techniques to use with certain types of datasets.
•The best performing oversamplers are identified through empirical evaluation.•The best performing principles are identified through empirical evaluation.•The top performers were found to depend slightly on characteristics of datasets. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2019.105662 |