High-Responsive Scheduling with MapReduce Performance Prediction on Hadoop YARN
Hadoop is an open-source big data analysis platform that is widely used in both academia and industry. Decoupling of resource management and programming framework, the next generation of Hadoop, namely Hadoop YARN, is accommodated to various programming frameworks and capable of handling more kinds...
Saved in:
Published in | 2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA) pp. 238 - 247 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.08.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Hadoop is an open-source big data analysis platform that is widely used in both academia and industry. Decoupling of resource management and programming framework, the next generation of Hadoop, namely Hadoop YARN, is accommodated to various programming frameworks and capable of handling more kinds of workload, such as interactive analysis and stream processing. However, most existent schedulers in YARN are designed for batch processing and they do not value per-job response time, which results in low responsiveness of the Hadoop platform. This paper proposes a FSPY (Fair Sojourn Protocol in YARN) scheduler to improve responsiveness with guaranteeing fairness. FSPY relies on job sizes which are unknown a priori. Consequently, we also present a job size prediction mechanism for MapReduce. Experimental results show that our scheduler outperforms Fair scheduler by 10x with respect to responsiveness under heavy workloads. Meanwhile, our prediction mechanism reaches an R2 prediction accuracy of 0.97. |
---|---|
ISSN: | 2325-1301 |
DOI: | 10.1109/RTCSA.2016.51 |