Stargazer: Toward efficient data analytics scheduling via task completion time inference

The fundamental challenge of data analytics scheduling is the heterogeneity of both data analytics jobs and resources. Although many scheduling solutions have been developed to improve the efficiency of data analytics frameworks (e.g., Spark), they either (1) focus on the scheduling of a single type...

Full description

Saved in:
Bibliographic Details
Published inComputers & electrical engineering Vol. 92; p. 107092
Main Authors Du, Haizhou, Zhang, Keke, Xiang, Qiao
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier Ltd 01.06.2021
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The fundamental challenge of data analytics scheduling is the heterogeneity of both data analytics jobs and resources. Although many scheduling solutions have been developed to improve the efficiency of data analytics frameworks (e.g., Spark), they either (1) focus on the scheduling of a single type of resource, without considering the coordination between different resources; or (2) schedule multiple resources by factoring in limited information about analytics jobs without considering the heterogeneity of resources. This paper presents Stargazer, a novel, efficient system that tackles diversity data analytics jobs on heterogeneous cluster by inferring the completion times of their decomposed tasks. Specifically, Stargazer adopts a deep learning model, which takes into considerations multiple key factors of diversity data analytics jobs and heterogeneous resources, to accurately infer the completion time of different tasks. A prototype of Stargazer is fully implemented in the Spark framework. Extensive experiments show that Stargazer can reduce the average job completion time by 21% and improve average performance by 20%, while incurring little overhead.
ISSN:0045-7906
1879-0755
DOI:10.1016/j.compeleceng.2021.107092