TimeLink: enabling dynamic runtime prediction for Flink iterative jobs

With the increasing growth of data scale and computing complexity, Flink, a novel distributed computing system, has been applied in various scenarios (e.g., machine learning) due to its excellent iterative nature. Predicting the runtime of Flink iterative jobs is critical to optimizing their perform...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 80; no. 11; pp. 16546 - 16573
Main Authors Yue, Xiaofei, Ding, Qingyang, Zhu, Jianming, Ding, Yanbing
Format Journal Article
LanguageEnglish
Published New York Springer US 2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the increasing growth of data scale and computing complexity, Flink, a novel distributed computing system, has been applied in various scenarios (e.g., machine learning) due to its excellent iterative nature. Predicting the runtime of Flink iterative jobs is critical to optimizing their performance. However, existing offline works generally ignore relevant runtime information, such as cluster state variations and inter-iteration dependencies, resulting in high actual prediction errors. Online methods, on the other hand, have a non-negligible time overhead. In light of this, we propose TimeLink , a dynamic runtime prediction algorithm for Flink iterative jobs. Its key idea consists of three stages: (1) TimeLink incorporates both offline and online execution features during runtime to measure the fine-grained similarity of iterative jobs, (2) it matches historical jobs with similar performance consumption to the current running iterative job in real time, and (3) its remaining runtime is predicted by combining the continuity of runtime bias between completed supersteps of matched jobs and the current iterative job. We implement TimeLink and evaluate it using realistic iterative workloads. The experimental results show that TimeLink exhibits relative average prediction errors of 5.91–12.86%. Moreover, it outperforms existing solutions with an improvement of over 6.24% in prediction accuracy.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-024-06085-x