MapReduce Tuning to Improve Distributed Machine Learning Performance

In this paper, we show how MapReduce parameters affect distributed processing of machine learning programs, which are supported by machine learning libraries, such as Hadoop Mahout and Spark MLlib. We constructed virtualized cluster on top of Docker containers and measured distributed machine learni...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) pp. 198 - 200
Main Authors Jeon, SungHwan, Chung, Haejin, Choi, Wonseok, Shin, Heeseong, Chun, Jonghoon, Kim, Jin Taek, Nah, Yunmook
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we show how MapReduce parameters affect distributed processing of machine learning programs, which are supported by machine learning libraries, such as Hadoop Mahout and Spark MLlib. We constructed virtualized cluster on top of Docker containers and measured distributed machine learning performance, while changing Hadoop parameters, such as number of replica, block size and memory buffer size.
DOI:10.1109/AIKE.2018.00045