Cost-sensitive learning methods for imbalanced data

Class imbalance is one of the challenging problems for machine learning algorithms. When learning from highly imbalanced data, most classifiers are overwhelmed by the majority class examples, so the false negative rate is always high. Although researchers have introduced many methods to deal with th...

Full description

Saved in:

Bibliographic Details
Published in	The 2010 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors	Thai-Nghe, Nguyen, Gantner, Zeno, Schmidt-Thieme, Lars
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2010
Subjects	Cancer Kernel Measurement Nearest neighbor searches Noise Rain Support vector machines
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Class imbalance is one of the challenging problems for machine learning algorithms. When learning from highly imbalanced data, most classifiers are overwhelmed by the majority class examples, so the false negative rate is always high. Although researchers have introduced many methods to deal with this problem, including resampling techniques and cost-sensitive learning (CSL), most of them focus on either of these techniques. This study presents two empirical methods that deal with class imbalance using both resampling and CSL. The first method combines and compares several sampling techniques with CSL using support vector machines (SVM). The second method proposes using CSL by optimizing the cost ratio (cost matrix) locally. Our experimental results on 18 imbalanced datasets from the UCI repository show that the first method can reduce the misclassification costs, and the second method can improve the classifier performance.
ISBN:	9781424469161 1424469163
ISSN:	2161-4393
DOI:	10.1109/IJCNN.2010.5596486