Parallelizing Machine Learning Optimization Algorithms on Distributed Data-Parallel Platforms with Parameter Server

In the big data era, machine learning optimization algorithms usually need to be designed and implemented on widely-used distributed computing platforms, such as Apache Hadoop, Spark, and Flink. However, these general distributed computing platforms themselves do not focus on parallelizing machine l...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) pp. 126 - 133
Main Authors	Gu, Rong, Fan, Shiqing, Hu, Qiu, Yuan, Chunfeng, Huang, Yihua
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2018
Subjects	Apache Spark Computational modeling data-parallel computing Machine learning Machine learning algorithms Machine learning optimization algorithm Optimization Parameter server Servers SGD Sparks Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In the big data era, machine learning optimization algorithms usually need to be designed and implemented on widely-used distributed computing platforms, such as Apache Hadoop, Spark, and Flink. However, these general distributed computing platforms themselves do not focus on parallelizing machine learning optimization algorithms. In this paper, we present a parallel optimization algorithm framework for scalable machine learning, and empirically evaluate the synchronous Elastic Averaging SGD (EASGD) and other distributed SGD-based optimization algorithms. First, we design a distributed machine learning optimization algorithm framework based on Apache Spark by adopting the parameter server. Then, we design and implement the widely-used distributed synchronous EASGD and several other popular SGD-based optimization algorithms, such as Adadelta and Adam, on top of the framework. In addition, we evaluate the performance of synchronous distributed EASGD compared with other distributed optimization algorithms based on the same framework. Finally, to explore the optimal settings of mini-batch size in large-scale distributed optimization, we further analyze the empirical linear scaling rule originally proposed in the single-node environment. Experimental results show that our parallel optimization algorithm framework achieves good flexibility and scalability. And, the distributed synchronous EASGD runs over the proposed framework gains a competitive convergence performance and is about 5.7% faster than other distributed SGD-based optimization algorithms. Experimental results also verified that the empirical linear scaling rule only holds well before the mini-batch size exceeds certain threshold on large-scale benchmarks in the distributed environment.
DOI:	10.1109/PADSW.2018.8644533