Cost-Balance Setting of MapReduce and Spark-Based Architectures for SVM

Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This...

Full description

Saved in:

Bibliographic Details
Published in	Applications of Computational Intelligence Vol. 833; pp. 137 - 149
Main Authors	Giraldo Londoño, Mario Alberto, Duitama, John Freddy, Arias-Londoño, Julián David
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2019 Springer International Publishing
Series	Communications in Computer and Information Science
Subjects	Classification MapReduce Spark Support vector machine
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This fact makes using SVM to train large data sets have a high computational cost. SVM implementations on distributed systems such as MapReduce and Spark have shown efficiency to improve computational cost; this paper analyzes how data subset size and number of mapping tasks affects SVM performance on MapReduce and Spark. Also, a cost model as a useful tool for setting data subset size according to available hardware and data to be processed is proposed.
ISBN:	3030030229 9783030030223
ISSN:	1865-0929 1865-0937
DOI:	10.1007/978-3-030-03023-0_12