A DQN-based resource scheduling optimization algorithm for multi-computing nodes of large AI models
In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue o...
Saved in:
Published in | IEEE ... Information Technology and Mechatronics Engineering Conference (ITOEC ... ) (Online) Vol. 8; pp. 1336 - 1340 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.03.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2693-289X |
DOI | 10.1109/ITOEC63606.2025.10968894 |
Cover
Loading…
Summary: | In the booming era of artificial intelligence, the demand for computing resources in AI large-model training is increasing exponentially. However, the traditional centralized computing resource management model fails to meet this growing demand. This paper focuses on the resource utilization issue of multiple computing center nodes and introduces the Deep Q-Network (DQN) algorithm to optimize the logic and strategy of computing task scheduling. A system model encompassing a specific state space, action space, and reward mechanism is established. A one-dimensional DNN network serves as the DQN training model, which is trained by means of the Mean Squared Error (MSE) loss function and the RMSProp optimizer. In the experiment, the DQN is compared with five traditional algorithm models. The results demonstrate that the DQN exhibits the smallest and most balanced fluctuations in resource utilization, boasting the best performance. Consequently, it offers an effective solution for computing resource management. |
---|---|
ISSN: | 2693-289X |
DOI: | 10.1109/ITOEC63606.2025.10968894 |