Cost-sensitive learning methods for imbalanced data

Class imbalance is one of the challenging problems for machine learning algorithms. When learning from highly imbalanced data, most classifiers are overwhelmed by the majority class examples, so the false negative rate is always high. Although researchers have introduced many methods to deal with th...

Full description

Saved in:
Bibliographic Details
Published inThe 2010 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors Thai-Nghe, Nguyen, Gantner, Zeno, Schmidt-Thieme, Lars
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Class imbalance is one of the challenging problems for machine learning algorithms. When learning from highly imbalanced data, most classifiers are overwhelmed by the majority class examples, so the false negative rate is always high. Although researchers have introduced many methods to deal with this problem, including resampling techniques and cost-sensitive learning (CSL), most of them focus on either of these techniques. This study presents two empirical methods that deal with class imbalance using both resampling and CSL. The first method combines and compares several sampling techniques with CSL using support vector machines (SVM). The second method proposes using CSL by optimizing the cost ratio (cost matrix) locally. Our experimental results on 18 imbalanced datasets from the UCI repository show that the first method can reduce the misclassification costs, and the second method can improve the classifier performance.
ISBN:9781424469161
1424469163
ISSN:2161-4393
DOI:10.1109/IJCNN.2010.5596486