Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes

The sample data for reinforcement learning algorithms often exhibit sparsity and instability, making the training results susceptible to falling into local optima. Mini-Max-Multi-agent Deep Deterministic Policy Gradient (M3DDPG) algorithm is a multi-agent reinforcement learning algorithm, which intr...

Full description

Saved in:

Bibliographic Details
Published in	Complex & intelligent systems Vol. 10; no. 5; pp. 6867 - 6883
Main Authors	Zhang, Yaozhong, Ding, Meiyan, Zhang, Jiandong, Yang, Qiming, Shi, Guoqing, Lu, Meiqu, Jiang, Frank
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.10.2024 Springer Nature B.V Springer
Subjects	Algorithms Complexity Computational Intelligence Convergence Data Structures and Information Theory Engineering M3DDPG (mini-max-multi-agent deep deterministic policy gradient) Machine learning Multiagent systems Original Article Particle swarm optimization Particle swarm optimization algorithm Pursuit-evasion game Pursuit-evasion games Reinforcement learning Unmanned aerial vehicles Pursuit-evasion game M3DDPG (mini-max-multi-agent deep deterministic policy gradient) Particle swarm optimization algorithm Reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The sample data for reinforcement learning algorithms often exhibit sparsity and instability, making the training results susceptible to falling into local optima. Mini-Max-Multi-agent Deep Deterministic Policy Gradient (M3DDPG) algorithm is a multi-agent reinforcement learning algorithm, which introduces the minimax theorem into Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. It also has unstable convergence caused by sparse sample data and randomization. However, the Particle Swarm Optimisation (PSO) algorithm, unlike traditional reinforcement learning methods, involves the construction of independent populations of policy networks to generate sample data, followed by training the reinforcement learning algorithm. PSO optimizes and updates the policy population based on a fitness function, aiming to enhance the efficiency and convergence speed of the algorithm in learning from the sample data. In order to address the multi-agent pursuit-evasion problem, we propose the PSO-M3DDPG algorithm, which combines the PSO algorithm with the M3DDPG algorithm. Through experimental simulations, the improved algorithm demonstrates superior training results and faster convergence speeds, thus validating its effectiveness.
ISSN:	2199-4536 2198-6053
DOI:	10.1007/s40747-024-01504-1