MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficient...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | New architecture GPUs like A100 are now equipped with multi-instance GPU
(MIG) technology, which allows the GPU to be partitioned into multiple small,
isolated instances. This technology provides more flexibility for users to
support both deep learning training and inference workloads, but efficiently
utilizing it can still be challenging. The vision of this paper is to provide a
more comprehensive and practical benchmark study for MIG in order to eliminate
the need for tedious manual benchmarking and tuning efforts. To achieve this
vision, the paper presents MIGPerf, an open-source tool that streamlines the
benchmark study for MIG. Using MIGPerf, the authors conduct a series of
experiments, including deep learning training and inference characterization on
MIG, GPU sharing characterization, and framework compatibility with MIG. The
results of these experiments provide new insights and guidance for users to
effectively employ MIG, and lay the foundation for further research on the
orchestration of hybrid training and inference workloads on MIGs. The code and
results are released on https://github.com/MLSysOps/MIGProfiler. This work is
still in progress and more results will be published soon. |
---|---|
AbstractList | New architecture GPUs like A100 are now equipped with multi-instance GPU
(MIG) technology, which allows the GPU to be partitioned into multiple small,
isolated instances. This technology provides more flexibility for users to
support both deep learning training and inference workloads, but efficiently
utilizing it can still be challenging. The vision of this paper is to provide a
more comprehensive and practical benchmark study for MIG in order to eliminate
the need for tedious manual benchmarking and tuning efforts. To achieve this
vision, the paper presents MIGPerf, an open-source tool that streamlines the
benchmark study for MIG. Using MIGPerf, the authors conduct a series of
experiments, including deep learning training and inference characterization on
MIG, GPU sharing characterization, and framework compatibility with MIG. The
results of these experiments provide new insights and guidance for users to
effectively employ MIG, and lay the foundation for further research on the
orchestration of hybrid training and inference workloads on MIGs. The code and
results are released on https://github.com/MLSysOps/MIGProfiler. This work is
still in progress and more results will be published soon. |
Author | Xiao, Wencong Huang, Yizheng Yin, Jianxiong See, Simon Luo, Yong Zhang, Huaizheng Di, Xing You, Yang Lau, Chiew Tong Li, Yuanming |
Author_xml | – sequence: 1 givenname: Huaizheng surname: Zhang fullname: Zhang, Huaizheng – sequence: 2 givenname: Yuanming surname: Li fullname: Li, Yuanming – sequence: 3 givenname: Wencong surname: Xiao fullname: Xiao, Wencong – sequence: 4 givenname: Yizheng surname: Huang fullname: Huang, Yizheng – sequence: 5 givenname: Xing surname: Di fullname: Di, Xing – sequence: 6 givenname: Jianxiong surname: Yin fullname: Yin, Jianxiong – sequence: 7 givenname: Simon surname: See fullname: See, Simon – sequence: 8 givenname: Yong surname: Luo fullname: Luo, Yong – sequence: 9 givenname: Chiew Tong surname: Lau fullname: Lau, Chiew Tong – sequence: 10 givenname: Yang surname: You fullname: You, Yang |
BackLink | https://doi.org/10.48550/arXiv.2301.00407$$DView paper in arXiv |
BookMark | eNotz9FOgzAUBuBe6IVOH8Ar-wJgC4WCdxMVSVjcBcZLcoBT14y1pJ2Lvv029Or8yfnzJ981uTDWICF3nIUiSxL2AO5HH8IoZjxkTDB5RcZVVa7RqUe6pIXdTQ43aLw-IH1C02924LZUWUefESdaIzijzRdtHOg5gBloZRS6Uxnpp3Xb0cLgqTV09T3udVAZv4fzr1x_-BtyqWD0ePt_F6R5fWmKt6B-L6tiWQeQShnIbuj6KE47kae94hnnwLokjTFDBthFmA8cFedcMs54IlmUDZj2IFDkSiYiXpD7v9mZ205Onxi_7Zndzuz4CEE3VEA |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2301.00407 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2301_00407 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a677-7bdbc236b496cf1811a0b563e8e0aeb2e9d1ef1117010157028de6ca4e49f7543 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:49:14 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a677-7bdbc236b496cf1811a0b563e8e0aeb2e9d1ef1117010157028de6ca4e49f7543 |
OpenAccessLink | https://arxiv.org/abs/2301.00407 |
ParticipantIDs | arxiv_primary_2301_00407 |
PublicationCentury | 2000 |
PublicationDate | 2023-01-01 |
PublicationDateYYYYMMDD | 2023-01-01 |
PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationYear | 2023 |
Score | 1.8670976 |
SecondaryResourceType | preprint |
Snippet | New architecture GPUs like A100 are now equipped with multi-instance GPU
(MIG) technology, which allows the GPU to be partitioned into multiple small,
isolated... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Computer Science - Performance |
Title | MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs |
URI | https://arxiv.org/abs/2301.00407 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV25TgNBDB0lqWgQCFA45YJ2YY85WLpw5EAKpEikdNHMjpcgwibaBMTn45ndCBrasSsf8rPHB2OXrrYfYyaC3ApKUKSOAmMp50GeETjmiQ65G3AePsv-hD9NxbTBYDsLo8vvt69qP7BZXxM-jq6cnakma8axa9nqvUyrz0m_iqvm_-UjjOmf_gSJ7h7brdEddCp17LMGFgdsMRz0Rljmt9AB534lzquucbgjE5l_6PIdCDrCA-IK6n2nrzCubzcApfow2I7lgStuL5barmFZgJ-eDQYe4RGtN5qsD9m4-zi-7wf1kYNAS6UCZazJ4kQansosp3Ab6dAImeANhpqyXkxthHnk7sOQ8whFcMCizDRHnuZK8OSItYplgW0GysbckDuGIaYcjdF5KDCzSPGHJJMmx6ztRTNbVXssZk5qMy-1k_9Jp2zHXVivqg5nrLUpP_Gc4vDGXHhl_ACVo4er |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MIGPerf%3A+A+Comprehensive+Benchmark+for+Deep+Learning+Training+and+Inference+Workloads+on+Multi-Instance+GPUs&rft.au=Zhang%2C+Huaizheng&rft.au=Li%2C+Yuanming&rft.au=Xiao%2C+Wencong&rft.au=Huang%2C+Yizheng&rft.date=2023-01-01&rft_id=info:doi/10.48550%2Farxiv.2301.00407&rft.externalDocID=2301_00407 |