UAV Pursuit-Evasion Game based on M2GPI algorithm

The Unmanned Aerial Vehicle (UAV) technology is one of the research hotspots in recent years. UAV has become more intelligent, more widely used in the military and more difficult to defend against. As a typical differential game in air combat, the one-to-one pursuit-evasion game of UAVs has been wid...

Full description

Saved in:

Bibliographic Details
Published in	2024 36th Chinese Control and Decision Conference (CCDC) pp. 795 - 800
Main Authors	Zhang, Yaozhong, Ding, Meiyan, Xu, Tianyue, Wu, Zhuoran, Xu, Zixiang
Format	Conference Proceeding
Language	English
Published	IEEE 25.05.2024
Subjects	Artificial neural networks Autonomous aerial vehicles Differential games Game theory Games Iterative algorithms Nash equilibrium One-on-one pursuit-evasion game Reinforcement learning UAVs
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The Unmanned Aerial Vehicle (UAV) technology is one of the research hotspots in recent years. UAV has become more intelligent, more widely used in the military and more difficult to defend against. As a typical differential game in air combat, the one-to-one pursuit-evasion game of UAVs has been widely concerned. In order to solve the one-to-one pursuitevasion game of UAVs, we use Minimax Q algorithm effectively combining with deep neural network and iterative updating of generalized policy, and propose an improved Mini-Max Q network learning algorithm based on Generalized Policy Iteration and fitted Q function (M2GPI) algorithm. Based on the classic Minimax Q algorithm, M2GPI algorithm makes two contributions: (1) the introduction of neural network to fit Q function, instead of Minimax Q algorithm Q table form, so that the algorithm can be applied to large-scale data problems. (2) Generalized policy iteration is introduced to solve the Nash equilibrium solution of both agents at each moment, which improves the updating efficiency of the algorithm. M2GPI algorithm obtains an effective policy by replacing the optimal solution with the equilibrium solution in game theory, which not only improves the convergence efficiency but also makes the policy reasonable. Experimental results show that M2GPI algorithm is superior to Minimax Q algorithm in convergence speed and success rate of tasks, which proves the rationality and superiority of M2GPI algorithm.
ISSN:	1948-9447
DOI:	10.1109/CCDC62350.2024.10587795