Compositional Coordinated Resource Provisioning in Workflows With Stochastic Durations

In performance engineering of composed services, coordinated provisioning can reduce the amount of resources required to meet end-to-end response time objectives. To this aim, various intertwined aspects of the application architecture need to be taken into account, notably including precedence cons...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 36; no. 9; pp. 1937 - 1954
Main Authors	Carnevali, Laura, Paolieri, Marco, Reali, Riccardo, Scommegna, Leonardo, Vicario, Enrico
Format	Journal Article
Language	English
Published	IEEE 01.09.2025
Subjects	Cloud computing composed services Coordinated resource provisioning end-to-end response time Fitting non-Markovian durations Probability density function Quality of service Random variables Resource management Scheduling Space exploration Time factors Topology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In performance engineering of composed services, coordinated provisioning can reduce the amount of resources required to meet end-to-end response time objectives. To this aim, various intertwined aspects of the application architecture need to be taken into account, notably including precedence constraints in the composition of elementary services, along with their durations and sensitivity to the scaling of provisioned resources. We address coordinated provisioning of resources for elementary services with stochastic durations with general distributions (i.e., including non-exponential distributions). We compose services in a workflow where precedence constraints define a Directed Acyclic Graph (DAG) and the distribution of the end-to-end (E2E) response time is subject to a Service Level Objective (SLO). We leverage a surrogate model of service performance, assuming a low workload of workflow requests (i.e., a single-request scenario) and service durations inversely proportional to provisioned resources. Given the total amount of resources, our approach derives the service provisioning that optimizes the workflow E2E response time distribution, by exploiting a compositional approach and by using stochastically ordered approximations to manage dependencies in non-well-nested precedence DAGs. Then, the approach scales provisioned resources up or down to determine the minimum amount of resources needed to satisfy the SLO, while leaving the remaining resources for horizontal scaling in order to manage multiple workflow requests at high workloads. Experiments consider low-workload and high-workload scenarios, different relations between elementary service durations and provisioned resources, and workflow topologies taken from benchmarks or randomly generated with controlled statistics, using elementary service durations from a dataset of the literature. Results show that the technique is feasible also for workflows with a thousand of services and that it outperforms other provisioning methods in fitting the SLO using the same resource amount and in minimizing the resource amount needed to fit the SLO.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2025.3585821