Compositional Coordinated Resource Provisioning in Workflows With Stochastic Durations
In performance engineering of composed services, coordinated provisioning can reduce the amount of resources required to meet end-to-end response time objectives. To this aim, various intertwined aspects of the application architecture need to be taken into account, notably including precedence cons...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 36; no. 9; pp. 1937 - 1954 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.09.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In performance engineering of composed services, coordinated provisioning can reduce the amount of resources required to meet end-to-end response time objectives. To this aim, various intertwined aspects of the application architecture need to be taken into account, notably including precedence constraints in the composition of elementary services, along with their durations and sensitivity to the scaling of provisioned resources. We address coordinated provisioning of resources for elementary services with stochastic durations with general distributions (i.e., including non-exponential distributions). We compose services in a workflow where precedence constraints define a Directed Acyclic Graph (DAG) and the distribution of the end-to-end (E2E) response time is subject to a Service Level Objective (SLO). We leverage a surrogate model of service performance, assuming a low workload of workflow requests (i.e., a single-request scenario) and service durations inversely proportional to provisioned resources. Given the total amount of resources, our approach derives the service provisioning that optimizes the workflow E2E response time distribution, by exploiting a compositional approach and by using stochastically ordered approximations to manage dependencies in non-well-nested precedence DAGs. Then, the approach scales provisioned resources up or down to determine the minimum amount of resources needed to satisfy the SLO, while leaving the remaining resources for horizontal scaling in order to manage multiple workflow requests at high workloads. Experiments consider low-workload and high-workload scenarios, different relations between elementary service durations and provisioned resources, and workflow topologies taken from benchmarks or randomly generated with controlled statistics, using elementary service durations from a dataset of the literature. Results show that the technique is feasible also for workflows with a thousand of services and that it outperforms other provisioning methods in fitting the SLO using the same resource amount and in minimizing the resource amount needed to fit the SLO. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2025.3585821 |