Robust and Distributionally Robust Optimization Models for Linear Support Vector Machine
In this paper we present novel data-driven optimization models for Support Vector Machines (SVM), with the aim of linearly separating two sets of points that have non-disjoint convex closures. Traditional classification algorithms assume that the training data points are always known exactly. Howeve...
Saved in:
Published in | Computers & operations research Vol. 147; p. 105930 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.11.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper we present novel data-driven optimization models for Support Vector Machines (SVM), with the aim of linearly separating two sets of points that have non-disjoint convex closures. Traditional classification algorithms assume that the training data points are always known exactly. However, real-life data are often subject to noise. To handle such uncertainty, we formulate robust models with uncertainty sets in the form of hyperrectangles or hyperellipsoids, and propose a moment-based distributionally robust optimization model enforcing limits on first-order deviations along principal directions. All the formulations reduce to convex programs. The efficiency of the new classifiers is evaluated on real-world databases. Experiments show that robust classifiers are especially beneficial for data sets with a small number of observations. As the dimension of the data sets increases, features behavior is gradually learned and higher levels of out-of-sample accuracy can be achieved via the considered distributionally robust optimization method. The proposed formulations, overall, allow finding a trade-off between increasing the average performance accuracy and protecting against uncertainty, with respect to deterministic approaches.
•We present data-driven models for Support Vector Machine under uncertainty.•Robust and moment-based distributionally robust optimization models are formulated.•Experiments are performed on real-world databases for several fields of applications.•Managerial insights on how to choose among robust formulations are provided. |
---|---|
ISSN: | 0305-0548 1873-765X |
DOI: | 10.1016/j.cor.2022.105930 |