On Subsampling Procedures for Support Vector Machines

Herein, theoretical results are presented to provide insights into the effectiveness of subsampling methods in reducing the amount of instances required in the training stage when applying support vector machines (SVMs) for classification in big data scenarios. Our main theorem states that under som...

Full description

Saved in:

Bibliographic Details
Published in	Mathematics (Basel) Vol. 10; no. 20; p. 3776
Main Authors	Bárcenas, Roberto, Gonzalez-Lima, Maria, Ortega, Joaquin, Quiroz, Adolfo
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.10.2022
Subjects	Algorithms bagging big data Classification Classifiers Datasets Food science Importance sampling Machine learning Mathematical research Methods Support vector machines Theorems Training Variables Germany
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Herein, theoretical results are presented to provide insights into the effectiveness of subsampling methods in reducing the amount of instances required in the training stage when applying support vector machines (SVMs) for classification in big data scenarios. Our main theorem states that under some conditions, there exists, with high probability, a feasible solution to the SVM problem for a randomly chosen training subsample, with the corresponding classifier as close as desired (in terms of classification error) to the classifier obtained from training with the complete dataset. The main theorem also reflects the curse of dimensionalityin that the assumptions made for the results are much more restrictive in large dimensions; thus, subsampling methods will perform better in lower dimensions. Additionally, we propose an importance sampling and bagging subsampling method that expands the nearest-neighbors ideas presented in previous work. Using different benchmark examples, the method proposed herein presents a faster solution to the SVM problem (without significant loss in accuracy) compared with the available state-of-the-art techniques.
ISSN:	2227-7390 2227-7390
DOI:	10.3390/math10203776