Cost-Balance Setting of MapReduce and Spark-Based Architectures for SVM
Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This...
Saved in:
Published in | Applications of Computational Intelligence Vol. 833; pp. 137 - 149 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2019
Springer International Publishing |
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Support Vector Machine (SVM) is a classifier widely used in machine learning because of its high generalization capacity. The sequential minimal optimization (SMO) its most popular implementation, scales somewhere between linear and quadratic in the training set size for various test problems. This fact makes using SVM to train large data sets have a high computational cost. SVM implementations on distributed systems such as MapReduce and Spark have shown efficiency to improve computational cost; this paper analyzes how data subset size and number of mapping tasks affects SVM performance on MapReduce and Spark. Also, a cost model as a useful tool for setting data subset size according to available hardware and data to be processed is proposed. |
---|---|
ISBN: | 3030030229 9783030030223 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-3-030-03023-0_12 |