Hierarchical Auto-scaling Policies for Data Stream Processing on Heterogeneous Resources

Data Stream Processing (DSP) applications analyze data flows in near real-time by means of operators, which process and transform incoming data. Operators handle high data rates running parallel replicas across multiple processors and hosts. To guarantee consistent performance without wasting resour...

Full description

Saved in:

Bibliographic Details
Published in	ACM transactions on autonomous and adaptive systems Vol. 18; no. 4; pp. 1 - 44
Main Authors	Russo Russo, Gabriele, Cardellini, Valeria, Lo Presti, Francesco
Format	Journal Article
Language	English
Published	New York, NY ACM 14.10.2023
Subjects	Computer systems organization Distributed architectures General and reference Information systems Self-organizing autonomic computing Stream management Surveys and overviews resource management Data Stream Processing Auto-scaling reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data Stream Processing (DSP) applications analyze data flows in near real-time by means of operators, which process and transform incoming data. Operators handle high data rates running parallel replicas across multiple processors and hosts. To guarantee consistent performance without wasting resources in the face of variable workloads, auto-scaling techniques have been studied to adapt operator parallelism at run-time. However, most of the effort has been spent under the assumption of homogeneous computing infrastructures, neglecting the complexity of modern environments.We consider the problem of deciding both how many operator replicas should be executed and which types of computing nodes should be acquired. We devise heterogeneity-aware policies by means of a two-layered hierarchy of controllers. While application-level components steer the adaptation process for whole applications, aiming to guarantee user-specified requirements, lower-layer components control auto-scaling of single operators. We tackle the fundamental challenge of performance and workload uncertainty, exploiting Bayesian optimization (BO) and reinforcement learning (RL) to devise policies. The evaluation shows that our approach is able to meet users’ requirements in terms of response time and adaptation overhead, while minimizing the cost due to resource usage, outperforming state-of-the-art baselines. We also demonstrate how partial model information is exploited to reduce training time for learning-based controllers.
ISSN:	1556-4665 1556-4703
DOI:	10.1145/3597435