High-Responsive Scheduling with MapReduce Performance Prediction on Hadoop YARN

Hadoop is an open-source big data analysis platform that is widely used in both academia and industry. Decoupling of resource management and programming framework, the next generation of Hadoop, namely Hadoop YARN, is accommodated to various programming frameworks and capable of handling more kinds...

Full description

Saved in:
Bibliographic Details
Published in2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) pp. 238 - 247
Main Authors Liu, Yang, Zeng, Yukun, Piao, Xuefeng
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Hadoop is an open-source big data analysis platform that is widely used in both academia and industry. Decoupling of resource management and programming framework, the next generation of Hadoop, namely Hadoop YARN, is accommodated to various programming frameworks and capable of handling more kinds of workload, such as interactive analysis and stream processing. However, most existent schedulers in YARN are designed for batch processing and they do not value per-job response time, which results in low responsiveness of the Hadoop platform. This paper proposes a FSPY (Fair Sojourn Protocol in YARN) scheduler to improve responsiveness with guaranteeing fairness. FSPY relies on job sizes which are unknown a priori. Consequently, we also present a job size prediction mechanism for MapReduce. Experimental results show that our scheduler outperforms Fair scheduler by 10x with respect to responsiveness under heavy workloads. Meanwhile, our prediction mechanism reaches an R2 prediction accuracy of 0.97.
ISSN:2325-1301
DOI:10.1109/RTCSA.2016.51