A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models

In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue o...

Full description

Saved in:
Bibliographic Details
Published inIEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) Vol. 8; pp. 1336 - 1340
Main Authors Gong, YuBin, Liu, JingJing, Huang, Xu, Bi, XueKe, Cui, WenBo
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.03.2025
Subjects
Online AccessGet full text
ISSN2693-289X
DOI10.1109/ITOEC63606.2025.10968894

Cover

Loading…
More Information
Summary:In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management.
ISSN:2693-289X
DOI:10.1109/ITOEC63606.2025.10968894