MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficient...

Full description

Saved in:
Bibliographic Details
Main Authors Zhang, Huaizheng, Li, Yuanming, Xiao, Wencong, Huang, Yizheng, Di, Xing, Yin, Jianxiong, See, Simon, Luo, Yong, Lau, Chiew Tong, You, Yang
Format Journal Article
LanguageEnglish
Published 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
AbstractList New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
Author Xiao, Wencong
Huang, Yizheng
Yin, Jianxiong
See, Simon
Luo, Yong
Zhang, Huaizheng
Di, Xing
You, Yang
Lau, Chiew Tong
Li, Yuanming
Author_xml – sequence: 1
  givenname: Huaizheng
  surname: Zhang
  fullname: Zhang, Huaizheng
– sequence: 2
  givenname: Yuanming
  surname: Li
  fullname: Li, Yuanming
– sequence: 3
  givenname: Wencong
  surname: Xiao
  fullname: Xiao, Wencong
– sequence: 4
  givenname: Yizheng
  surname: Huang
  fullname: Huang, Yizheng
– sequence: 5
  givenname: Xing
  surname: Di
  fullname: Di, Xing
– sequence: 6
  givenname: Jianxiong
  surname: Yin
  fullname: Yin, Jianxiong
– sequence: 7
  givenname: Simon
  surname: See
  fullname: See, Simon
– sequence: 8
  givenname: Yong
  surname: Luo
  fullname: Luo, Yong
– sequence: 9
  givenname: Chiew Tong
  surname: Lau
  fullname: Lau, Chiew Tong
– sequence: 10
  givenname: Yang
  surname: You
  fullname: You, Yang
BackLink https://doi.org/10.48550/arXiv.2301.00407$$DView paper in arXiv
BookMark eNotz9FOgzAUBuBe6IVOH8Ar-wJgC4WCdxMVSVjcBcZLcoBT14y1pJ2Lvv029Or8yfnzJ981uTDWICF3nIUiSxL2AO5HH8IoZjxkTDB5RcZVVa7RqUe6pIXdTQ43aLw-IH1C02924LZUWUefESdaIzijzRdtHOg5gBloZRS6Uxnpp3Xb0cLgqTV09T3udVAZv4fzr1x_-BtyqWD0ePt_F6R5fWmKt6B-L6tiWQeQShnIbuj6KE47kae94hnnwLokjTFDBthFmA8cFedcMs54IlmUDZj2IFDkSiYiXpD7v9mZ205Onxi_7Zndzuz4CEE3VEA
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2301.00407
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2301_00407
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a677-7bdbc236b496cf1811a0b563e8e0aeb2e9d1ef1117010157028de6ca4e49f7543
IEDL.DBID GOX
IngestDate Mon Jan 08 05:49:14 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a677-7bdbc236b496cf1811a0b563e8e0aeb2e9d1ef1117010157028de6ca4e49f7543
OpenAccessLink https://arxiv.org/abs/2301.00407
ParticipantIDs arxiv_primary_2301_00407
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationYear 2023
Score 1.8670976
SecondaryResourceType preprint
Snippet New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Computer Science - Performance
Title MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs
URI https://arxiv.org/abs/2301.00407
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV25TgNBDB0lqWgQCFA45YJ2YY85WLpw5EAKpEikdNHMjpcgwibaBMTn45ndCBrasSsf8rPHB2OXrrYfYyaC3ApKUKSOAmMp50GeETjmiQ65G3AePsv-hD9NxbTBYDsLo8vvt69qP7BZXxM-jq6cnakma8axa9nqvUyrz0m_iqvm_-UjjOmf_gSJ7h7brdEddCp17LMGFgdsMRz0Rljmt9AB534lzquucbgjE5l_6PIdCDrCA-IK6n2nrzCubzcApfow2I7lgStuL5barmFZgJ-eDQYe4RGtN5qsD9m4-zi-7wf1kYNAS6UCZazJ4kQansosp3Ab6dAImeANhpqyXkxthHnk7sOQ8whFcMCizDRHnuZK8OSItYplgW0GysbckDuGIaYcjdF5KDCzSPGHJJMmx6ztRTNbVXssZk5qMy-1k_9Jp2zHXVivqg5nrLUpP_Gc4vDGXHhl_ACVo4er
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MIGPerf%3A+A+Comprehensive+Benchmark+for+Deep+Learning+Training+and+Inference+Workloads+on+Multi-Instance+GPUs&rft.au=Zhang%2C+Huaizheng&rft.au=Li%2C+Yuanming&rft.au=Xiao%2C+Wencong&rft.au=Huang%2C+Yizheng&rft.date=2023-01-01&rft_id=info:doi/10.48550%2Farxiv.2301.00407&rft.externalDocID=2301_00407