Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous Resources

Data Stream Processing (DSP) applications analyze data flows in near real-time by means of operators, which process and transform incoming data. Operators handle high data rates running parallel replicas across multiple processors and hosts. To guarantee consistent performance without wasting resour...

Full description

Saved in:
Bibliographic Details
Published inACM transactions on autonomous and adaptive systems Vol. 18; no. 4; pp. 1 - 44
Main Authors Russo Russo, Gabriele, Cardellini, Valeria, Lo Presti, Francesco
Format Journal Article
LanguageEnglish
Published New York, NY ACM 14.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data Stream Processing (DSP) applications analyze data flows in near real-time by means of operators, which process and transform incoming data. Operators handle high data rates running parallel replicas across multiple processors and hosts. To guarantee consistent performance without wasting resources in the face of variable workloads, auto-scaling techniques have been studied to adapt operator parallelism at run-time. However, most of the effort has been spent under the assumption of homogeneous computing infrastructures, neglecting the complexity of modern environments.We consider the problem of deciding both how many operator replicas should be executed and which types of computing nodes should be acquired. We devise heterogeneity-aware policies by means of a two-layered hierarchy of controllers. While application-level components steer the adaptation process for whole applications, aiming to guarantee user-specified requirements, lower-layer components control auto-scaling of single operators. We tackle the fundamental challenge of performance and workload uncertainty, exploiting Bayesian optimization (BO) and reinforcement learning (RL) to devise policies. The evaluation shows that our approach is able to meet users’ requirements in terms of response time and adaptation overhead, while minimizing the cost due to resource usage, outperforming state-of-the-art baselines. We also demonstrate how partial model information is exploited to reduce training time for learning-based controllers.
ISSN:1556-4665
1556-4703
DOI:10.1145/3597435