Parallelizing Machine Learning Optimization Algorithms on Distributed Data-Parallel Platforms with Parameter Server
In the big data era, machine learning optimization algorithms usually need to be designed and implemented on widely-used distributed computing platforms, such as Apache Hadoop, Spark, and Flink. However, these general distributed computing platforms themselves do not focus on parallelizing machine l...
Saved in:
Published in | 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) pp. 126 - 133 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In the big data era, machine learning optimization algorithms usually need to be designed and implemented on widely-used distributed computing platforms, such as Apache Hadoop, Spark, and Flink. However, these general distributed computing platforms themselves do not focus on parallelizing machine learning optimization algorithms. In this paper, we present a parallel optimization algorithm framework for scalable machine learning, and empirically evaluate the synchronous Elastic Averaging SGD (EASGD) and other distributed SGD-based optimization algorithms. First, we design a distributed machine learning optimization algorithm framework based on Apache Spark by adopting the parameter server. Then, we design and implement the widely-used distributed synchronous EASGD and several other popular SGD-based optimization algorithms, such as Adadelta and Adam, on top of the framework. In addition, we evaluate the performance of synchronous distributed EASGD compared with other distributed optimization algorithms based on the same framework. Finally, to explore the optimal settings of mini-batch size in large-scale distributed optimization, we further analyze the empirical linear scaling rule originally proposed in the single-node environment. Experimental results show that our parallel optimization algorithm framework achieves good flexibility and scalability. And, the distributed synchronous EASGD runs over the proposed framework gains a competitive convergence performance and is about 5.7% faster than other distributed SGD-based optimization algorithms. Experimental results also verified that the empirical linear scaling rule only holds well before the mini-batch size exceeds certain threshold on large-scale benchmarks in the distributed environment. |
---|---|
DOI: | 10.1109/PADSW.2018.8644533 |