Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification

[Display omitted] •Novel embedded feature selection approach for SVM for imbalanced data sets.•Optimization is performed via Quasi-Newton and Armijo Search.•Best classification performance is achieved in experiments on benchmark datasets. In this work, we propose a novel feature selection approach d...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 67; pp. 94 - 105
Main Authors Maldonado, Sebastián, López, Julio
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Novel embedded feature selection approach for SVM for imbalanced data sets.•Optimization is performed via Quasi-Newton and Armijo Search.•Best classification performance is achieved in experiments on benchmark datasets. In this work, we propose a novel feature selection approach designed to deal with two major issues in machine learning, namely class-imbalance and high dimensionality. The proposed embedded strategy penalizes the cardinality of the feature set via the scaling factors technique, and is used with two support vector machine (SVM) formulations designed to deal with the class-imbalanced problem, namely Cost Sensitive SVM, and Support Vector Data Description. The proposed concave formulations are solved via a Quasi-Newton update and Armijo line search. We performed experiments on 12 highly imbalanced microarray datasets using linear and Gaussian kernel, achieving the highest average predictive performance with our approach compared with the most well-known feature selection strategies.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2018.02.051