A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models

In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue o...

Full description

Saved in:

Bibliographic Details
Published in	IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) Vol. 8; pp. 1336 - 1340
Main Authors	Gong, YuBin, Liu, JingJing, Huang, Xu, Bi, XueKe, Cui, WenBo
Format	Conference Proceeding
Language	English
Published	IEEE 14.03.2025
Subjects	Computational modeling Computing Network DQN Fluctuations Mechatronics Optimization Reinforcement learning Reliability engineering Resource management Resource Utilization Rate Scheduling Scheduling algorithms Training
Online Access	Get full text
ISSN	2693-289X
DOI	10.1109/ITOEC63606.2025.10968894

Cover

Loading…

Abstract	In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management.
AbstractList	In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management.
Author	Liu, JingJing Cui, WenBo Bi, XueKe Huang, Xu Gong, YuBin
Author_xml	– sequence: 1 givenname: YuBin surname: Gong fullname: Gong, YuBin email: 22120054@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China – sequence: 2 givenname: JingJing surname: Liu fullname: Liu, JingJing email: liujingjing@ln.chinamobile.com organization: China Mobile Group Liaoning Company Limited,China – sequence: 3 givenname: Xu surname: Huang fullname: Huang, Xu email: 21111024@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China – sequence: 4 givenname: XueKe surname: Bi fullname: Bi, XueKe email: 23125015@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China – sequence: 5 givenname: WenBo surname: Cui fullname: Cui, WenBo email: 24115017@bjtu.edu.cn organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China
BookMark	eNo1kFFLwzAUhaMoOOf-gQ_5A51pbpvdPI45dTAcQh98G1l600XaZiTtg_56J-rTgY_DB-fcsqs-9MQYz8U8z4V-2FS79UqBEmouhSznZ6YQdXHBZnqhESAvpS5QXrKJVBoyifr9hs1S-hBCgBSgUU-YXfLHt9fsYBLVPFIKY7TEkz1SPba-b3g4Db7zX2bwoeembUL0w7HjLkTeje3gMxu60zj8VPtQU-LB8dbEhvhyw7szadMdu3amTTT7yymrntbV6iXb7p43q-U28xqGDAWiQwfWGqcOVMiFVVai1XAoyNSolbLKGCrQLaDMjTYOyBQKSiRw571Tdv-r9US0P0Xfmfi5_78FvgHTJ1tI
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ITOEC63606.2025.10968894
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798331529482
EISSN	2693-289X
EndPage	1340
ExternalDocumentID	10968894
Genre	orig-research
GrantInformation_xml	– fundername: Fundamental Research Funds for the Central Universities funderid: 10.13039/501100012226 – fundername: Nature funderid: 10.13039/501100020487
GroupedDBID	6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i93t-8088f8f3ccaf6be427c6c28c93b4ead8966c6aae48f7351a9af3ea46358e3f833
IEDL.DBID	RIE
IngestDate	Wed Apr 30 05:50:36 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i93t-8088f8f3ccaf6be427c6c28c93b4ead8966c6aae48f7351a9af3ea46358e3f833
PageCount	5
ParticipantIDs	ieee_primary_10968894
PublicationCentury	2000
PublicationDate	2025-March-14
PublicationDateYYYYMMDD	2025-03-14
PublicationDate_xml	– month: 03 year: 2025 text: 2025-March-14 day: 14
PublicationDecade	2020
PublicationTitle	IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online)
PublicationTitleAbbrev	ITOEC
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003203989
Score	1.9045936
Snippet	In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the...
SourceID	ieee
SourceType	Publisher
StartPage	1336
SubjectTerms	Computational modeling Computing Network DQN Fluctuations Mechatronics Optimization Reinforcement learning Reliability engineering Resource management Resource Utilization Rate Scheduling Scheduling algorithms Training
Title	A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models
URI	https://ieeexplore.ieee.org/document/10968894
Volume	8
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl5T2ybtkuOYG5vgVJiw20jT93S4rbJ1F_96k6ydKAjeQkgg5LX58l6-7z1CbtCCEFpbswQgYwIBmTaJYqqtUxEZiXHihMIPo3TwIu4nyaQSq3stDAB48hkErunf8vPCbFyozP7hKpVSiQZpWM9tK9baBVR4HHIlVc3WCdXtcPzY67p8WI6KECdBPf1HIRWPI_0DMqpXsKWPvAebMgvM56_kjP9e4iFpfUv26NMOjI7IHiyPienQu-cRc0iV01UVqKfWn7X44mTotLAHxqJSYlI9fy1Ws_JtQe1FlnqmITO-6IMbuixyWNMC6dxxx2lnSH0RnXWLjPu9cXfAqqoKbKZ4aRFJSpTIreUwzUDEbZOaWBrFM2G_KmndH5NqDUJimyeRVho5aGHvJRI4Ss5PSHNZLOGUUDQ6Vyh4HmEmRBxqRBChDjMhY63D9Iy03AZNP7Z5M6b13pz_0X9B9p2dHMMrEpekWa42cGUhv8yuvam_AOyPrgc
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Da2jZplxzH3Nh0qwoVdhtpmqfDrZWtu_jXm2TtREHwFkIIIa_N9_Lyfe8hdAMahEDb2gmVSh0KChwhQ-7wloioLxkEoREKj-Ko_0Lvx-G4EqtbLYxSypLPlGua9i0_K-TKhMr0H84jxjjdRjsa-Clfy7U2IRUSeIQzXvN1PH47SB67HZMRy5ARgtCtJ_hRSsUiSW8fxfUa1gSSd3dVpq78_JWe8d-LPEDNb9EeftrA0SHaUvkRkm189xw7BqsyvKhC9VjfaDXCGCE6LvSRMa-0mFjMXovFtHybY-3KYss1dKQt-2CG5kWmlrgAPDPscdweYFtGZ9lESa-bdPpOVVfBmXJSakxiDBgQbTuIUkWDloxkwCQnKdXfFdMXIBkJoSiDFgl9wQUQJaj2TJgiwAg5Ro28yNUJwiBFxoGSzIeU0sATAIp6wkspC4TwolPUNBs0-VhnzpjUe3P2R_812u0no-FkOIgfztGesZnhe_n0AjXKxUpdagegTK-s2b8APDGxVw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+...+Information+Technology+and+Mechatronics+Engineering+Conference+%28ITOEC+...+%29+%28Online%29&rft.atitle=A+DQN-based+resource+scheduling+optimization+algorithm+for+multi-computing+nodes+of+large+AI+models&rft.au=Gong%2C+YuBin&rft.au=Liu%2C+JingJing&rft.au=Huang%2C+Xu&rft.au=Bi%2C+XueKe&rft.date=2025-03-14&rft.pub=IEEE&rft.eissn=2693-289X&rft.volume=8&rft.spage=1336&rft.epage=1340&rft_id=info:doi/10.1109%2FITOEC63606.2025.10968894&rft.externalDocID=10968894