UAV Pursuit-Evasion Game based on M2GPI algorithm

The Unmanned Aerial Vehicle (UAV) technology is one of the research hotspots in recent years. UAV has become more intelligent, more widely used in the military and more difficult to defend against. As a typical differential game in air combat, the one-to-one pursuit-evasion game of UAVs has been wid...

Full description

Saved in:
Bibliographic Details
Published in2024 36th Chinese Control and Decision Conference (CCDC) pp. 795 - 800
Main Authors Zhang, Yaozhong, Ding, Meiyan, Xu, Tianyue, Wu, Zhuoran, Xu, Zixiang
Format Conference Proceeding
LanguageEnglish
Published IEEE 25.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The Unmanned Aerial Vehicle (UAV) technology is one of the research hotspots in recent years. UAV has become more intelligent, more widely used in the military and more difficult to defend against. As a typical differential game in air combat, the one-to-one pursuit-evasion game of UAVs has been widely concerned. In order to solve the one-to-one pursuitevasion game of UAVs, we use Minimax Q algorithm effectively combining with deep neural network and iterative updating of generalized policy, and propose an improved Mini-Max Q network learning algorithm based on Generalized Policy Iteration and fitted Q function (M2GPI) algorithm. Based on the classic Minimax Q algorithm, M2GPI algorithm makes two contributions: (1) the introduction of neural network to fit Q function, instead of Minimax Q algorithm Q table form, so that the algorithm can be applied to large-scale data problems. (2) Generalized policy iteration is introduced to solve the Nash equilibrium solution of both agents at each moment, which improves the updating efficiency of the algorithm. M2GPI algorithm obtains an effective policy by replacing the optimal solution with the equilibrium solution in game theory, which not only improves the convergence efficiency but also makes the policy reasonable. Experimental results show that M2GPI algorithm is superior to Minimax Q algorithm in convergence speed and success rate of tasks, which proves the rationality and superiority of M2GPI algorithm.
ISSN:1948-9447
DOI:10.1109/CCDC62350.2024.10587795