最小化内存冗余的自动并行策略生成方法

TP181; 受内存和计算资源限制,大规模深度学习模型通常以分布式方式训练.现有策略生成方法很少以最小化内存占用作为目标.为此,提出一种新算法,能够生成以最小化内存冗余为目标的自动并行策略.提出一种冗余内存代价模型来计算给定并行策略中每个算子的内存开销.为确保生成最优的并行策略,将并行策略搜索问题形式化为整数线性规划问题,使用高效求解器寻找具有最小内存占用的算子内并行策略.所提方法在多维并行训练框架中实现;实验结果表明,与最新Megatron-LM方法相比,可节省高达67%的内存开销,而吞吐量相差不大....

Full description

Saved in:
Bibliographic Details
Published in信息与电子工程前沿(英文版) Vol. 26; no. 1; pp. 109 - 后插8
Main Authors 时彦琦, 梁鹏, 郑浩, 乔林波, 李东升
Format Journal Article
LanguageChinese
Published 国防科技大学并行与分布处理国家重点实验室,中国长沙市,410000 2025
Subjects
Online AccessGet full text
ISSN2095-9184
DOI10.1631/FITEE.2300684

Cover

Abstract TP181; 受内存和计算资源限制,大规模深度学习模型通常以分布式方式训练.现有策略生成方法很少以最小化内存占用作为目标.为此,提出一种新算法,能够生成以最小化内存冗余为目标的自动并行策略.提出一种冗余内存代价模型来计算给定并行策略中每个算子的内存开销.为确保生成最优的并行策略,将并行策略搜索问题形式化为整数线性规划问题,使用高效求解器寻找具有最小内存占用的算子内并行策略.所提方法在多维并行训练框架中实现;实验结果表明,与最新Megatron-LM方法相比,可节省高达67%的内存开销,而吞吐量相差不大.
AbstractList TP181; 受内存和计算资源限制,大规模深度学习模型通常以分布式方式训练.现有策略生成方法很少以最小化内存占用作为目标.为此,提出一种新算法,能够生成以最小化内存冗余为目标的自动并行策略.提出一种冗余内存代价模型来计算给定并行策略中每个算子的内存开销.为确保生成最优的并行策略,将并行策略搜索问题形式化为整数线性规划问题,使用高效求解器寻找具有最小内存占用的算子内并行策略.所提方法在多维并行训练框架中实现;实验结果表明,与最新Megatron-LM方法相比,可节省高达67%的内存开销,而吞吐量相差不大.
Abstract_FL Large-scale deep learning models are trained distributedly due to memory and computing resource limitations.Few existing strategy generation approaches take optimal memory minimization as the objective.To fill in this gap,we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory redundancy.We propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel strategy.To generate the optimal parallelism strategy,we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism strategies.Furthermore,the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory redundancy.Experimental results demonstrate that our approach achieves memory savings of up to 67%compared to the latest Megatron-LM strategies;in contrast,the gap between the throughput of our approach and its counterparts is not large.
Author 梁鹏
李东升
郑浩
时彦琦
乔林波
AuthorAffiliation 国防科技大学并行与分布处理国家重点实验室,中国长沙市,410000
AuthorAffiliation_xml – name: 国防科技大学并行与分布处理国家重点实验室,中国长沙市,410000
Author_FL Hao ZHENG
Yanqi SHI
Peng LIANG
Linbo QIAO
Dongsheng LI
Author_FL_xml – sequence: 1
  fullname: Yanqi SHI
– sequence: 2
  fullname: Peng LIANG
– sequence: 3
  fullname: Hao ZHENG
– sequence: 4
  fullname: Linbo QIAO
– sequence: 5
  fullname: Dongsheng LI
Author_xml – sequence: 1
  fullname: 时彦琦
– sequence: 2
  fullname: 梁鹏
– sequence: 3
  fullname: 郑浩
– sequence: 4
  fullname: 乔林波
– sequence: 5
  fullname: 李东升
BookMark eNrjYmDJy89LZWAQNTTQMzQzNtR38wxxddUzMjYwMLMwYWHgNDKwNNW1NLQw4WDgLS7OTDIwMjGwNDE0NeRkMHs2p-Hphv6nPdOetrU-XTvjadv0J3tnPp_V8qJ91dOuFU93bnuxsOf52mnPpy59PmX-s44Jz6btfLZ5Kg8Da1piTnEqL5TmZtBycw1x9tAtT8xLS8xLj8_KLy3KA8rEV2WlVFQkJcenGhkYmRoYGhhYGpOkGACT3FNQ
ClassificationCodes TP181
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.1631/FITEE.2300684
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
DocumentTitle_FL Automatic parallelism strategy generation with minimal memory redundancy
EndPage 后插8
ExternalDocumentID zjdxxbc_e202501009
GroupedDBID -SI
-S~
0R~
2B.
2KG
4.4
406
4A8
5VR
92I
93N
96X
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAPKM
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAYIU
AAYTO
AAYZH
AAZMS
ABAKF
ABBRH
ABDBE
ABDZT
ABECU
ABFTD
ABFTV
ABJCF
ABJNI
ABJOX
ABKCH
ABMQK
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACPIV
ACZOJ
ADKNI
ADKPE
ADRFC
ADURQ
ADYFF
ADZKW
AEBTG
AEFQL
AEGNC
AEJHL
AEJRE
AEMSY
AENEX
AEOHA
AESKC
AETCA
AEVLU
AEXYK
AFBBN
AFDZB
AFKRA
AFLOW
AFOHR
AFQWF
AFUIB
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHPBZ
AHSBF
AHYZX
AIAKS
AIGIU
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALFXC
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMXSW
AMYLF
ANMIH
AOCGG
ARAPS
ATHPR
AXYYD
AYFIA
BENPR
BGLVJ
BGNMA
CAJEI
CCEZO
CCPQU
CHBEP
CUBFJ
CW9
DDRTE
DNIVK
DPUIP
EBLON
EBS
EIOEI
EJD
FA0
FERAY
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FYJPI
GGCAI
GGRSB
HCIFZ
IKXTQ
IWAJR
J-C
JUIAU
JZLTJ
K7-
KOV
LLZTM
M4Y
M7S
NPVJJ
NQJWS
NU0
O9J
PHGZM
PHGZT
PMFND
PSX
PT4
PTHSS
Q--
R-I
RLLFE
ROL
RSV
S..
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
TCJ
TGT
TSG
U1G
U5S
UG4
UOJIU
UTJUX
UZXMN
VFIZW
ZMTXR
ID FETCH-wanfang_journals_zjdxxbc_e2025010093
ISSN 2095-9184
IngestDate Thu May 29 04:06:16 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Deep learning
最小化内存冗余
自动并行
Minimal memory redundancy
深度学习
Automatic parallelism
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-wanfang_journals_zjdxxbc_e2025010093
ParticipantIDs wanfang_journals_zjdxxbc_e202501009
PublicationCentury 2000
PublicationDate 2025
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025
PublicationDecade 2020
PublicationTitle 信息与电子工程前沿(英文版)
PublicationTitle_FL Frontiers of Information Technology & Electronic Engineering
PublicationYear 2025
Publisher 国防科技大学并行与分布处理国家重点实验室,中国长沙市,410000
Publisher_xml – name: 国防科技大学并行与分布处理国家重点实验室,中国长沙市,410000
SSID ssib024094151
ssj0001619798
ssib022561413
ssib031263382
ssib045218325
ssib051367619
Score 4.830028
Snippet TP181;...
SourceID wanfang
SourceType Aggregation Database
StartPage 109
Title 最小化内存冗余的自动并行策略生成方法
URI https://d.wanfangdata.com.cn/periodical/zjdxxbc-e202501009
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFA9le9GD-InfFnQuwmoySebjONlOKII9VeitZHeziocVtIXSkwetIgU92PqBIB4EEQqCIPTgf2N2-2f43pvZ3SztoXoJb2fevK9fNm9mkpkJghtSdqXmRa-pIbk18SZpKtEVTchFBRdFO-yWuDj57qJYuJfcWU6XZxrXal8tra22b3U2Dl1X8j-oQhngiqtk_wHZsVAoABrwhSsgDNcjYcysYLqFHyvYlGUhUzkSqsW0IEIwlSJh5plWvkRLZhOWQYlmVjJtmEqYVUxJZgzxGGaIOdMsE1hlIpQJzChHUCuQmRKRMJ2jGUoxHZI9AhsCkcXMHWw56vuS3hylIX_ETE4loNp6UdnYWueR9FrAHpWRbZqpeRLOUZTNWdZC1ZYYssgbAL5AK2RWI57xfAexSPIsxTAY51AIRJ3FcDTRanRG5ZMa-AV-kSKw1tTEJsgKTqB8S2GmGBhen1hxy6_pT4D6dYYmgFCAB1xCV6WXDjA4XE2ChS4uRhwGzDiIKTqsHI9CQ11zBJhcdFVeKVRZkqMxXhhWuHICj6rACaiCewJVUAmI4iN1Zn7a_pTwIGC0HhnAgT-hVzuTfMNDOrbTndg3So5uO4Oph4DLdFGoa50mUhmSqwJvdu1W_B_IzyLG_JxDVra4BCEUXtv0lucbD7vr6-3OSomwhBEt0J3lUkZpI5g1eZYtjlIGZCPoX056pBynK6LJi-444iKOJ9s3JSkOCSYpLKUdC_2Ohg_dAEhLOkF7HA6_My-YfnvKcFrm1-8V_fu1HunSyeCEH0rOGfdcOBXMbDw4HRyvbTB6JhCDT0-rH6-rrZ1q83m1-77afPfn94fhx2f7L75Xr75Ve7_2v2wNd3eG21-Hbz8PXr4Z7OwNfm6fDW7mdqm10PSaV_zT6MnKwZjF54JG_1G_PB_MddowMCrKjipClcS8VKrHy6Qno0L2dMmTC8H1Iwi8eCSuS8ExpN005eWgsfp4rbwCHffV9lUP3l-rD6kt
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E6%9C%80%E5%B0%8F%E5%8C%96%E5%86%85%E5%AD%98%E5%86%97%E4%BD%99%E7%9A%84%E8%87%AA%E5%8A%A8%E5%B9%B6%E8%A1%8C%E7%AD%96%E7%95%A5%E7%94%9F%E6%88%90%E6%96%B9%E6%B3%95&rft.jtitle=%E4%BF%A1%E6%81%AF%E4%B8%8E%E7%94%B5%E5%AD%90%E5%B7%A5%E7%A8%8B%E5%89%8D%E6%B2%BF%EF%BC%88%E8%8B%B1%E6%96%87%E7%89%88%EF%BC%89&rft.au=%E6%97%B6%E5%BD%A6%E7%90%A6&rft.au=%E6%A2%81%E9%B9%8F&rft.au=%E9%83%91%E6%B5%A9&rft.au=%E4%B9%94%E6%9E%97%E6%B3%A2&rft.date=2025&rft.pub=%E5%9B%BD%E9%98%B2%E7%A7%91%E6%8A%80%E5%A4%A7%E5%AD%A6%E5%B9%B6%E8%A1%8C%E4%B8%8E%E5%88%86%E5%B8%83%E5%A4%84%E7%90%86%E5%9B%BD%E5%AE%B6%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E4%B8%AD%E5%9B%BD%E9%95%BF%E6%B2%99%E5%B8%82%2C410000&rft.issn=2095-9184&rft.volume=26&rft.issue=1&rft.spage=109&rft.epage=%E5%90%8E%E6%8F%928&rft_id=info:doi/10.1631%2FFITEE.2300684&rft.externalDocID=zjdxxbc_e202501009
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fzjdxxbc-e%2Fzjdxxbc-e.jpg