A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models

In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue o...

Full description

Saved in:
Bibliographic Details
Published inIEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) Vol. 8; pp. 1336 - 1340
Main Authors Gong, YuBin, Liu, JingJing, Huang, Xu, Bi, XueKe, Cui, WenBo
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.03.2025
Subjects
Online AccessGet full text
ISSN2693-289X
DOI10.1109/ITOEC63606.2025.10968894

Cover

Loading…
Abstract In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management.
AbstractList In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management.
Author Liu, JingJing
Cui, WenBo
Bi, XueKe
Huang, Xu
Gong, YuBin
Author_xml – sequence: 1
  givenname: YuBin
  surname: Gong
  fullname: Gong, YuBin
  email: 22120054@bjtu.edu.cn
  organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China
– sequence: 2
  givenname: JingJing
  surname: Liu
  fullname: Liu, JingJing
  email: liujingjing@ln.chinamobile.com
  organization: China Mobile Group Liaoning Company Limited,China
– sequence: 3
  givenname: Xu
  surname: Huang
  fullname: Huang, Xu
  email: 21111024@bjtu.edu.cn
  organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China
– sequence: 4
  givenname: XueKe
  surname: Bi
  fullname: Bi, XueKe
  email: 23125015@bjtu.edu.cn
  organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China
– sequence: 5
  givenname: WenBo
  surname: Cui
  fullname: Cui, WenBo
  email: 24115017@bjtu.edu.cn
  organization: Beijing Jiaotong University,College of Electronic and Information Engineering,Beijing,China
BookMark eNo1kFFLwzAUhaMoOOf-gQ_5A51pbpvdPI45dTAcQh98G1l600XaZiTtg_56J-rTgY_DB-fcsqs-9MQYz8U8z4V-2FS79UqBEmouhSznZ6YQdXHBZnqhESAvpS5QXrKJVBoyifr9hs1S-hBCgBSgUU-YXfLHt9fsYBLVPFIKY7TEkz1SPba-b3g4Db7zX2bwoeembUL0w7HjLkTeje3gMxu60zj8VPtQU-LB8dbEhvhyw7szadMdu3amTTT7yymrntbV6iXb7p43q-U28xqGDAWiQwfWGqcOVMiFVVai1XAoyNSolbLKGCrQLaDMjTYOyBQKSiRw571Tdv-r9US0P0Xfmfi5_78FvgHTJ1tI
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ITOEC63606.2025.10968894
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331529482
EISSN 2693-289X
EndPage 1340
ExternalDocumentID 10968894
Genre orig-research
GrantInformation_xml – fundername: Fundamental Research Funds for the Central Universities
  funderid: 10.13039/501100012226
– fundername: Nature
  funderid: 10.13039/501100020487
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i93t-8088f8f3ccaf6be427c6c28c93b4ead8966c6aae48f7351a9af3ea46358e3f833
IEDL.DBID RIE
IngestDate Wed Apr 30 05:50:36 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-8088f8f3ccaf6be427c6c28c93b4ead8966c6aae48f7351a9af3ea46358e3f833
PageCount 5
ParticipantIDs ieee_primary_10968894
PublicationCentury 2000
PublicationDate 2025-March-14
PublicationDateYYYYMMDD 2025-03-14
PublicationDate_xml – month: 03
  year: 2025
  text: 2025-March-14
  day: 14
PublicationDecade 2020
PublicationTitle IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online)
PublicationTitleAbbrev ITOEC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003203989
Score 1.9045936
Snippet In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the...
SourceID ieee
SourceType Publisher
StartPage 1336
SubjectTerms Computational modeling
Computing Network
DQN
Fluctuations
Mechatronics
Optimization
Reinforcement learning
Reliability engineering
Resource management
Resource Utilization Rate
Scheduling
Scheduling algorithms
Training
Title A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models
URI https://ieeexplore.ieee.org/document/10968894
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl5T2ybtkuOYG5vgVJiw20jT93S4rbJ1F_96k6ydKAjeQkgg5LX58l6-7z1CbtCCEFpbswQgYwIBmTaJYqqtUxEZiXHihMIPo3TwIu4nyaQSq3stDAB48hkErunf8vPCbFyozP7hKpVSiQZpWM9tK9baBVR4HHIlVc3WCdXtcPzY67p8WI6KECdBPf1HIRWPI_0DMqpXsKWPvAebMgvM56_kjP9e4iFpfUv26NMOjI7IHiyPienQu-cRc0iV01UVqKfWn7X44mTotLAHxqJSYlI9fy1Ws_JtQe1FlnqmITO-6IMbuixyWNMC6dxxx2lnSH0RnXWLjPu9cXfAqqoKbKZ4aRFJSpTIreUwzUDEbZOaWBrFM2G_KmndH5NqDUJimyeRVho5aGHvJRI4Ss5PSHNZLOGUUDQ6Vyh4HmEmRBxqRBChDjMhY63D9Iy03AZNP7Z5M6b13pz_0X9B9p2dHMMrEpekWa42cGUhv8yuvam_AOyPrgc
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Da2jZplxzH3Nh0qwoVdhtpmqfDrZWtu_jXm2TtREHwFkIIIa_N9_Lyfe8hdAMahEDb2gmVSh0KChwhQ-7wloioLxkEoREKj-Ko_0Lvx-G4EqtbLYxSypLPlGua9i0_K-TKhMr0H84jxjjdRjsa-Clfy7U2IRUSeIQzXvN1PH47SB67HZMRy5ARgtCtJ_hRSsUiSW8fxfUa1gSSd3dVpq78_JWe8d-LPEDNb9EeftrA0SHaUvkRkm189xw7BqsyvKhC9VjfaDXCGCE6LvSRMa-0mFjMXovFtHybY-3KYss1dKQt-2CG5kWmlrgAPDPscdweYFtGZ9lESa-bdPpOVVfBmXJSakxiDBgQbTuIUkWDloxkwCQnKdXfFdMXIBkJoSiDFgl9wQUQJaj2TJgiwAg5Ro28yNUJwiBFxoGSzIeU0sATAIp6wkspC4TwolPUNBs0-VhnzpjUe3P2R_812u0no-FkOIgfztGesZnhe_n0AjXKxUpdagegTK-s2b8APDGxVw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+...+Information+Technology+and+Mechatronics+Engineering+Conference+%28ITOEC+...+%29+%28Online%29&rft.atitle=A+DQN-based+resource+scheduling+optimization+algorithm+for+multi-computing+nodes+of+large+AI+models&rft.au=Gong%2C+YuBin&rft.au=Liu%2C+JingJing&rft.au=Huang%2C+Xu&rft.au=Bi%2C+XueKe&rft.date=2025-03-14&rft.pub=IEEE&rft.eissn=2693-289X&rft.volume=8&rft.spage=1336&rft.epage=1340&rft_id=info:doi/10.1109%2FITOEC63606.2025.10968894&rft.externalDocID=10968894