Randomized Algorithms for Scheduling Multi-Resource Jobs in the Cloud
We consider the problem of scheduling jobs with multiple-resource requirements (CPU, memory, and disk) in a distributed server platform, motivated by data-parallel and cloud computing applications. Jobs arrive dynamically over time and require certain amount of multiple resources for the duration of...
Saved in:
Published in | IEEE/ACM transactions on networking Vol. 26; no. 5; pp. 2202 - 2215 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.10.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We consider the problem of scheduling jobs with multiple-resource requirements (CPU, memory, and disk) in a distributed server platform, motivated by data-parallel and cloud computing applications. Jobs arrive dynamically over time and require certain amount of multiple resources for the duration of their service. When a job arrives, it is queued and later served by one of the servers that has sufficient remaining resources to serve it. The scheduling of jobs is subject to two constraints: 1) packing constraints : multiple jobs can be served simultaneously by a single server if their cumulative resource requirement does not exceed the capacity of the server, and 2) non-preemption : to avoid costly preemptions, once a job is scheduled in a server, its service cannot be interrupted or migrated to another server. Prior scheduling algorithms rely on either bin packing heuristics which have low complexity but can have a poor throughput, or MaxWeight solutions that can achieve maximum throughput but repeatedly require to solve or approximate instances of a hard combinatorial problem (Knapsack) over time. In this paper, we propose a randomized scheduling algorithm for placing jobs in servers that can achieve maximum throughput with low complexity. The algorithm is naturally distributed and each queue and each server needs to perform only a constant number of operations per time unit. Extensive simulation results, using both synthetic and real traffic traces, are presented to evaluate the throughput and delay performance compared to prior algorithms. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1063-6692 1558-2566 |
DOI: | 10.1109/TNET.2018.2863647 |