The Optimal Strategies of Maneuver Decision in Air Combat of UCAV Based on the Improved TD3 Algorithm

Nowadays, unmanned aerial vehicles (UAVs) pose a significant challenge to air defense systems. Unmanned combat aerial vehicles (UCAVs) have been proven to be an effective method to counter the threat of UAVs in application. Therefore, maneuver decision-making has become the crucial technology to ach...

Full description

Saved in:

Bibliographic Details
Published in	Drones (Basel) Vol. 8; no. 9; p. 501
Main Authors	Gao, Xianzhong, Zhang, Yue, Wang, Baolai, Leng, Zhihui, Hou, Zhongxi
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.09.2024
Subjects	Air combat Air defense Air defenses Air warfare Aircraft Aircraft pilots Algorithms Antiairborne warfare Artificial intelligence autonomous air combat Combat aircraft Decision making Deep learning deep reinforcement learning Degrees of freedom Drone aircraft Dynamic models Efficiency Ground stations Knowledge maneuver decision-making Maneuvers Markov analysis Military aspects Optimization Overloading scenario-transfer training Swarm intelligence Unmanned aerial vehicles unmanned combat aerial vehicles (UCAVs) Velocity China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Nowadays, unmanned aerial vehicles (UAVs) pose a significant challenge to air defense systems. Unmanned combat aerial vehicles (UCAVs) have been proven to be an effective method to counter the threat of UAVs in application. Therefore, maneuver decision-making has become the crucial technology to achieve autonomous air combat for UCAVs. In order to solve the problem of maneuver decision-making, an autonomous model of UCAVs based on the deep reinforcement learning method was proposed in this paper. Firstly, the six-degree-of-freedom (DoF) dynamic model was built in three-dimensional space, and the continuous actions of tangential overload, normal overload, and roll angle were selected as the maneuver inputs. Secondly, to improve the convergence speed for the deep reinforcement learning method, the idea of “scenario-transfer training” was introduced into the twin delayed deep deterministic (TD3) policy gradient algorithm, the results showing that the improved algorithm could cut off about 60% of the training time. Thirdly, for the “nose-to-nose turns”, which is one of the classical maneuvers for experienced pilots, the optimal maneuver generated by the proposed method was analyzed. The results showed that the maneuver strategy obtained by the proposed method was highly consistent with that made by experienced fighter pilots. This is also the first time in a public article that compared the maneuver decisions made by the deep reinforcement learning method with experienced fighter pilots. This research can provide some meaningful references to generate autonomous decision-making strategies for UCAVs.
ISSN:	2504-446X 2504-446X
DOI:	10.3390/drones8090501