MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficient...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Zhang, Huaizheng, Li, Yuanming, Xiao, Wencong, Huang, Yizheng, Xing, Di, Yin, Jianxiong, See, Simon, Luo, Yong, Chiew Tong Lau, Yang, You
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 01.01.2023
Subjects	Benchmarks Deep learning Inference Training Workload Workloads
Online Access	Get full text

Cover

Loading…

More Information
Summary:	New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
ISSN:	2331-8422