High-Responsive Scheduling with MapReduce Performance Prediction on Hadoop YARN

Hadoop is an open-source big data analysis platform that is widely used in both academia and industry. Decoupling of resource management and programming framework, the next generation of Hadoop, namely Hadoop YARN, is accommodated to various programming frameworks and capable of handling more kinds...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) pp. 238 - 247
Main Authors	Liu, Yang, Zeng, Yukun, Piao, Xuefeng
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2016
Subjects	Containers fairness job size prediction Processor scheduling Programming Resource management responsiveness scheduling Time factors Yarn
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Hadoop is an open-source big data analysis platform that is widely used in both academia and industry. Decoupling of resource management and programming framework, the next generation of Hadoop, namely Hadoop YARN, is accommodated to various programming frameworks and capable of handling more kinds of workload, such as interactive analysis and stream processing. However, most existent schedulers in YARN are designed for batch processing and they do not value per-job response time, which results in low responsiveness of the Hadoop platform. This paper proposes a FSPY (Fair Sojourn Protocol in YARN) scheduler to improve responsiveness with guaranteeing fairness. FSPY relies on job sizes which are unknown a priori. Consequently, we also present a job size prediction mechanism for MapReduce. Experimental results show that our scheduler outperforms Fair scheduler by 10x with respect to responsiveness under heavy workloads. Meanwhile, our prediction mechanism reaches an R2 prediction accuracy of 0.97.
ISSN:	2325-1301
DOI:	10.1109/RTCSA.2016.51