Stargazer: Toward efficient data analytics scheduling via task completion time inference
The fundamental challenge of data analytics scheduling is the heterogeneity of both data analytics jobs and resources. Although many scheduling solutions have been developed to improve the efficiency of data analytics frameworks (e.g., Spark), they either (1) focus on the scheduling of a single type...
Saved in:
Published in | Computers & electrical engineering Vol. 92; p. 107092 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Amsterdam
Elsevier Ltd
01.06.2021
Elsevier BV |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The fundamental challenge of data analytics scheduling is the heterogeneity of both data analytics jobs and resources. Although many scheduling solutions have been developed to improve the efficiency of data analytics frameworks (e.g., Spark), they either (1) focus on the scheduling of a single type of resource, without considering the coordination between different resources; or (2) schedule multiple resources by factoring in limited information about analytics jobs without considering the heterogeneity of resources. This paper presents Stargazer, a novel, efficient system that tackles diversity data analytics jobs on heterogeneous cluster by inferring the completion times of their decomposed tasks. Specifically, Stargazer adopts a deep learning model, which takes into considerations multiple key factors of diversity data analytics jobs and heterogeneous resources, to accurately infer the completion time of different tasks. A prototype of Stargazer is fully implemented in the Spark framework. Extensive experiments show that Stargazer can reduce the average job completion time by 21% and improve average performance by 20%, while incurring little overhead. |
---|---|
ISSN: | 0045-7906 1879-0755 |
DOI: | 10.1016/j.compeleceng.2021.107092 |