Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the s...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 33; no. 1; pp. 88 - 100 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!