A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models
In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue o...
Saved in:
Published in | IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) Vol. 8; pp. 1336 - 1340 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.03.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2693-289X |
DOI | 10.1109/ITOEC63606.2025.10968894 |
Cover
Loading…
Abstract | In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management. |
---|---|
AbstractList | In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management. |
Author | Liu, JingJing Cui, WenBo Bi, XueKe Huang, Xu Gong, YuBin |
Author_xml | – sequence: 1 givenname: YuBin surname: Gong fullname: Gong, YuBin email: 22120054@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China – sequence: 2 givenname: JingJing surname: Liu fullname: Liu, JingJing email: liujingjing@ln.chinamobile.com organization: China Mobile Group Liaoning Company Limited,China – sequence: 3 givenname: Xu surname: Huang fullname: Huang, Xu email: 21111024@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China – sequence: 4 givenname: XueKe surname: Bi fullname: Bi, XueKe email: 23125015@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China – sequence: 5 givenname: WenBo surname: Cui fullname: Cui, WenBo email: 24115017@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China |
BookMark | eNo1kFFLwzAUhaMoOOf-gQ_5A51pbpvdPI45dTAcQh98G1l600XaZiTtg_56J-rTgY_DB-fcsqs-9MQYz8U8z4V-2FS79UqBEmouhSznZ6YQdXHBZnqhESAvpS5QXrKJVBoyifr9hs1S-hBCgBSgUU-YXfLHt9fsYBLVPFIKY7TEkz1SPba-b3g4Db7zX2bwoeembUL0w7HjLkTeje3gMxu60zj8VPtQU-LB8dbEhvhyw7szadMdu3amTTT7yymrntbV6iXb7p43q-U28xqGDAWiQwfWGqcOVMiFVVai1XAoyNSolbLKGCrQLaDMjTYOyBQKSiRw571Tdv-r9US0P0Xfmfi5_78FvgHTJ1tI |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ITOEC63606.2025.10968894 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798331529482 |
EISSN | 2693-289X |
EndPage | 1340 |
ExternalDocumentID | 10968894 |
Genre | orig-research |
GrantInformation_xml | – fundername: Fundamental Research Funds for the Central Universities funderid: 10.13039/501100012226 – fundername: Nature funderid: 10.13039/501100020487 |
GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
ID | FETCH-LOGICAL-i93t-8088f8f3ccaf6be427c6c28c93b4ead8966c6aae48f7351a9af3ea46358e3f833 |
IEDL.DBID | RIE |
IngestDate | Wed Apr 30 05:50:36 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i93t-8088f8f3ccaf6be427c6c28c93b4ead8966c6aae48f7351a9af3ea46358e3f833 |
PageCount | 5 |
ParticipantIDs | ieee_primary_10968894 |
PublicationCentury | 2000 |
PublicationDate | 2025-March-14 |
PublicationDateYYYYMMDD | 2025-03-14 |
PublicationDate_xml | – month: 03 year: 2025 text: 2025-March-14 day: 14 |
PublicationDecade | 2020 |
PublicationTitle | IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) |
PublicationTitleAbbrev | ITOEC |
PublicationYear | 2025 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003203989 |
Score | 1.9045936 |
Snippet | In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1336 |
SubjectTerms | Computational modeling Computing Network DQN Fluctuations Mechatronics Optimization Reinforcement learning Reliability engineering Resource management Resource Utilization Rate Scheduling Scheduling algorithms Training |
Title | A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models |
URI | https://ieeexplore.ieee.org/document/10968894 |
Volume | 8 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl5T2ybtkuOYG5vgVJiw20jT93S4rbJ1F_96k6ydKAjeQkgg5LX58l6-7z1CbtCCEFpbswQgYwIBmTaJYqqtUxEZiXHihMIPo3TwIu4nyaQSq3stDAB48hkErunf8vPCbFyozP7hKpVSiQZpWM9tK9baBVR4HHIlVc3WCdXtcPzY67p8WI6KECdBPf1HIRWPI_0DMqpXsKWPvAebMgvM56_kjP9e4iFpfUv26NMOjI7IHiyPienQu-cRc0iV01UVqKfWn7X44mTotLAHxqJSYlI9fy1Ws_JtQe1FlnqmITO-6IMbuixyWNMC6dxxx2lnSH0RnXWLjPu9cXfAqqoKbKZ4aRFJSpTIreUwzUDEbZOaWBrFM2G_KmndH5NqDUJimyeRVho5aGHvJRI4Ss5PSHNZLOGUUDQ6Vyh4HmEmRBxqRBChDjMhY63D9Iy03AZNP7Z5M6b13pz_0X9B9p2dHMMrEpekWa42cGUhv8yuvam_AOyPrgc |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Da2jZplxzH3Nh0qwoVdhtpmqfDrZWtu_jXm2TtREHwFkIIIa_N9_Lyfe8hdAMahEDb2gmVSh0KChwhQ-7wloioLxkEoREKj-Ko_0Lvx-G4EqtbLYxSypLPlGua9i0_K-TKhMr0H84jxjjdRjsa-Clfy7U2IRUSeIQzXvN1PH47SB67HZMRy5ARgtCtJ_hRSsUiSW8fxfUa1gSSd3dVpq78_JWe8d-LPEDNb9EeftrA0SHaUvkRkm189xw7BqsyvKhC9VjfaDXCGCE6LvSRMa-0mFjMXovFtHybY-3KYss1dKQt-2CG5kWmlrgAPDPscdweYFtGZ9lESa-bdPpOVVfBmXJSakxiDBgQbTuIUkWDloxkwCQnKdXfFdMXIBkJoSiDFgl9wQUQJaj2TJgiwAg5Ro28yNUJwiBFxoGSzIeU0sATAIp6wkspC4TwolPUNBs0-VhnzpjUe3P2R_812u0no-FkOIgfztGesZnhe_n0AjXKxUpdagegTK-s2b8APDGxVw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+...+Information+Technology+and+Mechatronics+Engineering+Conference+%28ITOEC+...+%29+%28Online%29&rft.atitle=A+DQN-based+resource+scheduling+optimization+algorithm+for+multi-computing+nodes+of+large+AI+models&rft.au=Gong%2C+YuBin&rft.au=Liu%2C+JingJing&rft.au=Huang%2C+Xu&rft.au=Bi%2C+XueKe&rft.date=2025-03-14&rft.pub=IEEE&rft.eissn=2693-289X&rft.volume=8&rft.spage=1336&rft.epage=1340&rft_id=info:doi/10.1109%2FITOEC63606.2025.10968894&rft.externalDocID=10968894 |