MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficient...

Full description

Saved in:

Bibliographic Details
Main Authors	Zhang, Huaizheng, Li, Yuanming, Xiao, Wencong, Huang, Yizheng, Di, Xing, Yin, Jianxiong, See, Simon, Luo, Yong, Lau, Chiew Tong, You, Yang
Format	Journal Article
Language	English
Published	01.01.2023
Subjects	Computer Science - Learning Computer Science - Performance
Online Access	Get full text

Cover

Loading…

Abstract	New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
AbstractList	New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
Author	Xiao, Wencong Huang, Yizheng Yin, Jianxiong See, Simon Luo, Yong Zhang, Huaizheng Di, Xing You, Yang Lau, Chiew Tong Li, Yuanming
Author_xml	– sequence: 1 givenname: Huaizheng surname: Zhang fullname: Zhang, Huaizheng – sequence: 2 givenname: Yuanming surname: Li fullname: Li, Yuanming – sequence: 3 givenname: Wencong surname: Xiao fullname: Xiao, Wencong – sequence: 4 givenname: Yizheng surname: Huang fullname: Huang, Yizheng – sequence: 5 givenname: Xing surname: Di fullname: Di, Xing – sequence: 6 givenname: Jianxiong surname: Yin fullname: Yin, Jianxiong – sequence: 7 givenname: Simon surname: See fullname: See, Simon – sequence: 8 givenname: Yong surname: Luo fullname: Luo, Yong – sequence: 9 givenname: Chiew Tong surname: Lau fullname: Lau, Chiew Tong – sequence: 10 givenname: Yang surname: You fullname: You, Yang
BackLink	https://doi.org/10.48550/arXiv.2301.00407$$DView paper in arXiv
BookMark	eNotz9FOgzAUBuBe6IVOH8Ar-wJgC4WCdxMVSVjcBcZLcoBT14y1pJ2Lvv029Or8yfnzJ981uTDWICF3nIUiSxL2AO5HH8IoZjxkTDB5RcZVVa7RqUe6pIXdTQ43aLw-IH1C02924LZUWUefESdaIzijzRdtHOg5gBloZRS6Uxnpp3Xb0cLgqTV09T3udVAZv4fzr1x_-BtyqWD0ePt_F6R5fWmKt6B-L6tiWQeQShnIbuj6KE47kae94hnnwLokjTFDBthFmA8cFedcMs54IlmUDZj2IFDkSiYiXpD7v9mZ205Onxi_7Zndzuz4CEE3VEA
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2301.00407
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2301_00407
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a677-7bdbc236b496cf1811a0b563e8e0aeb2e9d1ef1117010157028de6ca4e49f7543
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:49:14 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a677-7bdbc236b496cf1811a0b563e8e0aeb2e9d1ef1117010157028de6ca4e49f7543
OpenAccessLink	https://arxiv.org/abs/2301.00407
ParticipantIDs	arxiv_primary_2301_00407
PublicationCentury	2000
PublicationDate	2023-01-01
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– month: 01 year: 2023 text: 2023-01-01 day: 01
PublicationDecade	2020
PublicationYear	2023
Score	1.8670976
SecondaryResourceType	preprint
Snippet	New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning Computer Science - Performance
Title	MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs
URI	https://arxiv.org/abs/2301.00407
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV25TgNBDB0lqWgQCFA45YJ2YY85WLpw5EAKpEikdNHMjpcgwibaBMTn45ndCBrasSsf8rPHB2OXrrYfYyaC3ApKUKSOAmMp50GeETjmiQ65G3AePsv-hD9NxbTBYDsLo8vvt69qP7BZXxM-jq6cnakma8axa9nqvUyrz0m_iqvm_-UjjOmf_gSJ7h7brdEddCp17LMGFgdsMRz0Rljmt9AB534lzquucbgjE5l_6PIdCDrCA-IK6n2nrzCubzcApfow2I7lgStuL5barmFZgJ-eDQYe4RGtN5qsD9m4-zi-7wf1kYNAS6UCZazJ4kQansosp3Ab6dAImeANhpqyXkxthHnk7sOQ8whFcMCizDRHnuZK8OSItYplgW0GysbckDuGIaYcjdF5KDCzSPGHJJMmx6ztRTNbVXssZk5qMy-1k_9Jp2zHXVivqg5nrLUpP_Gc4vDGXHhl_ACVo4er
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MIGPerf%3A+A+Comprehensive+Benchmark+for+Deep+Learning+Training+and+Inference+Workloads+on+Multi-Instance+GPUs&rft.au=Zhang%2C+Huaizheng&rft.au=Li%2C+Yuanming&rft.au=Xiao%2C+Wencong&rft.au=Huang%2C+Yizheng&rft.date=2023-01-01&rft_id=info:doi/10.48550%2Farxiv.2301.00407&rft.externalDocID=2301_00407