Randomized Algorithms for Scheduling Multi-Resource Jobs in the Cloud

We consider the problem of scheduling jobs with multiple-resource requirements (CPU, memory, and disk) in a distributed server platform, motivated by data-parallel and cloud computing applications. Jobs arrive dynamically over time and require certain amount of multiple resources for the duration of...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on networking Vol. 26; no. 5; pp. 2202 - 2215
Main Authors Psychas, Konstantinos, Ghaderi, Javad
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We consider the problem of scheduling jobs with multiple-resource requirements (CPU, memory, and disk) in a distributed server platform, motivated by data-parallel and cloud computing applications. Jobs arrive dynamically over time and require certain amount of multiple resources for the duration of their service. When a job arrives, it is queued and later served by one of the servers that has sufficient remaining resources to serve it. The scheduling of jobs is subject to two constraints: 1) packing constraints : multiple jobs can be served simultaneously by a single server if their cumulative resource requirement does not exceed the capacity of the server, and 2) non-preemption : to avoid costly preemptions, once a job is scheduled in a server, its service cannot be interrupted or migrated to another server. Prior scheduling algorithms rely on either bin packing heuristics which have low complexity but can have a poor throughput, or MaxWeight solutions that can achieve maximum throughput but repeatedly require to solve or approximate instances of a hard combinatorial problem (Knapsack) over time. In this paper, we propose a randomized scheduling algorithm for placing jobs in servers that can achieve maximum throughput with low complexity. The algorithm is naturally distributed and each queue and each server needs to perform only a constant number of operations per time unit. Extensive simulation results, using both synthetic and real traffic traces, are presented to evaluate the throughput and delay performance compared to prior algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1063-6692
1558-2566
DOI:10.1109/TNET.2018.2863647