A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models

In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue o...

Full description

Saved in:

Bibliographic Details
Published in	IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) Vol. 8; pp. 1336 - 1340
Main Authors	Gong, YuBin, Liu, JingJing, Huang, Xu, Bi, XueKe, Cui, WenBo
Format	Conference Proceeding
Language	English
Published	IEEE 14.03.2025
Subjects	Computational modeling Computing Network DQN Fluctuations Mechatronics Optimization Reinforcement learning Reliability engineering Resource management Resource Utilization Rate Scheduling Scheduling algorithms Training
Online Access	Get full text
ISSN	2693-289X
DOI	10.1109/ITOEC63606.2025.10968894

Cover

Loading…

More Information
Summary:	In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management.
ISSN:	2693-289X
DOI:	10.1109/ITOEC63606.2025.10968894