MapReduce Tuning to Improve Distributed Machine Learning Performance

In this paper, we show how MapReduce parameters affect distributed processing of machine learning programs, which are supported by machine learning libraries, such as Hadoop Mahout and Spark MLlib. We constructed virtualized cluster on top of Docker containers and measured distributed machine learni...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) pp. 198 - 200
Main Authors	Jeon, SungHwan, Chung, Haejin, Choi, Wonseok, Shin, Heeseong, Chun, Jonghoon, Kim, Jin Taek, Nah, Yunmook
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2018
Subjects	distributed machine learning distributed processing MapReduce tuning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we show how MapReduce parameters affect distributed processing of machine learning programs, which are supported by machine learning libraries, such as Hadoop Mahout and Spark MLlib. We constructed virtualized cluster on top of Docker containers and measured distributed machine learning performance, while changing Hadoop parameters, such as number of replica, block size and memory buffer size.
DOI:	10.1109/AIKE.2018.00045