GScheduler: Optimizing resource provision by using GPU usage pattern extraction in cloud environments

GPU-based clusters are widely chosen for accelerating a variety of scientific applications in high-end cloud environments. With their growing popularity, there is a necessity for improving the system throughput and decreasing the turnaround time for co-executing applications on the same GPU device....

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) pp. 3225 - 3230
Main Authors Zhuqing Xu, Fang Dong, Jiahui Jin, Junzhou Luo, Jun Shen
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:GPU-based clusters are widely chosen for accelerating a variety of scientific applications in high-end cloud environments. With their growing popularity, there is a necessity for improving the system throughput and decreasing the turnaround time for co-executing applications on the same GPU device. However, resource contention among multiple applications on a multi-tasked GPU leads to the performance degradation of applications. Previous works are not accurate enough to learn the characteristics of GPU application before execution, or cannot get such information timely, which may lead to misleading scheduling decisions. In this paper, we present GScheduler, a framework to detect and reduce interference for co-executing applications on the GPU-based cloud. The most important feature of GScheduler is to utilize GPU usage pattern extractor for detecting interference between applications. It is composed of key function-call graph extractor and key GPU resource usage vector extractor, the former is used to detect the similarity of GPU usage mode between applications, while the latter is used to calculate the similarity of GPU resource requirements in-between. In addition, an interference aware scheduler is proposed to minimize the interference. We evaluated our framework with 26 diverse, real-world CUDA applications. When compared with state-of the-art interference-oblivious schedulers, our framework improves system throughput by 36% on average, and achieves a 30.5% reduction of turnaround time on average.
DOI:10.1109/SMC.2017.8123125