Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes

The sample data for reinforcement learning algorithms often exhibit sparsity and instability, making the training results susceptible to falling into local optima. Mini-Max-Multi-agent Deep Deterministic Policy Gradient (M3DDPG) algorithm is a multi-agent reinforcement learning algorithm, which intr...

Full description

Saved in:
Bibliographic Details
Published inComplex & intelligent systems Vol. 10; no. 5; pp. 6867 - 6883
Main Authors Zhang, Yaozhong, Ding, Meiyan, Zhang, Jiandong, Yang, Qiming, Shi, Guoqing, Lu, Meiqu, Jiang, Frank
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.10.2024
Springer Nature B.V
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The sample data for reinforcement learning algorithms often exhibit sparsity and instability, making the training results susceptible to falling into local optima. Mini-Max-Multi-agent Deep Deterministic Policy Gradient (M3DDPG) algorithm is a multi-agent reinforcement learning algorithm, which introduces the minimax theorem into Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. It also has unstable convergence caused by sparse sample data and randomization. However, the Particle Swarm Optimisation (PSO) algorithm, unlike traditional reinforcement learning methods, involves the construction of independent populations of policy networks to generate sample data, followed by training the reinforcement learning algorithm. PSO optimizes and updates the policy population based on a fitness function, aiming to enhance the efficiency and convergence speed of the algorithm in learning from the sample data. In order to address the multi-agent pursuit-evasion problem, we propose the PSO-M3DDPG algorithm, which combines the PSO algorithm with the M3DDPG algorithm. Through experimental simulations, the improved algorithm demonstrates superior training results and faster convergence speeds, thus validating its effectiveness.
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-024-01504-1