Autonomous navigation of UAV in multi-obstacle environments based on a Deep Reinforcement Learning approach

Path planning is one of the most essential part in autonomous navigation. Most existing works suppose that the environment is static and fixed. However, path planning is widely used in random and dynamic environment (such as search and rescue, surveillance and other scenarios). In this paper, we pro...

Full description

Saved in:

Bibliographic Details
Published in	Applied soft computing Vol. 115; p. 108194
Main Authors	Zhang, Sitong, Li, Yibing, Dong, Qianhui
Format	Journal Article
Language	English
Published	Elsevier B.V 01.01.2022
Subjects	Path planning Twin delayed deep deterministic policy gradients Two-stream Actor–Critic Network UAV Path planning UAV Twin delayed deep deterministic policy gradients Two-stream Actor–Critic Network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Path planning is one of the most essential part in autonomous navigation. Most existing works suppose that the environment is static and fixed. However, path planning is widely used in random and dynamic environment (such as search and rescue, surveillance and other scenarios). In this paper, we propose a Deep Reinforcement Learning (DRL)-based method that enables unmanned aerial vehicles (UAVs) to execute navigation tasks in multi-obstacle environments with randomness and dynamics. The method is based on the Twin Delayed Deep Deterministic Policy Gradients (TD3) algorithm. In order to predict the impact of the environment on UAV, the change of environment observations is added into the Actor–Critic network input, and the two-stream Actor–Critic network structure is proposed to extract features of environment observations. Simulations are carried out to evaluate the performance of the algorithm and experiment results show that our method can enable the UAV to complete autonomous navigation tasks safely in multi-obstacle environments, which reflects the efficiency of our method. Moreover, compared to DDPG and the conventional TD3, our method has better generalization ability. •A new type of TD3 network is proposed to complete autonomous UAV navigation task.•Unity 3D is used as a simulator to reduce the reality gap and analyze UAV behaviors.•The proposed algorithm has better adaptability in random and dynamic environments.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2021.108194