Cost-Balance Setting of MapReduce and Spark-Based Architectures for SVM

Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This...

Full description

Saved in:
Bibliographic Details
Published inApplications of Computational Intelligence Vol. 833; pp. 137 - 149
Main Authors Giraldo Londoño, Mario Alberto, Duitama, John Freddy, Arias-Londoño, Julián David
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2019
Springer International Publishing
SeriesCommunications in Computer and Information Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This fact makes using SVM to train large data sets have a high computational cost. SVM implementations on distributed systems such as MapReduce and Spark have shown efficiency to improve computational cost; this paper analyzes how data subset size and number of mapping tasks affects SVM performance on MapReduce and Spark. Also, a cost model as a useful tool for setting data subset size according to available hardware and data to be processed is proposed.
ISBN:3030030229
9783030030223
ISSN:1865-0929
1865-0937
DOI:10.1007/978-3-030-03023-0_12