Deep hierarchical reinforcement learning based formation planning for multiple unmanned surface vehicles with experimental results

In this paper, a novel multi-USV formation path planning algorithm is proposed based on deep reinforcement learning. First, a goal-based hierarchical reinforcement learning algorithm is designed to improve training speed and resolve planning conflicts within the formation. Second, an improved artifi...

Full description

Saved in:
Bibliographic Details
Published inOcean engineering Vol. 286; p. 115577
Main Authors Wei, Xiangwei, Wang, Hao, Tang, Yixuan
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 15.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, a novel multi-USV formation path planning algorithm is proposed based on deep reinforcement learning. First, a goal-based hierarchical reinforcement learning algorithm is designed to improve training speed and resolve planning conflicts within the formation. Second, an improved artificial potential field algorithm is designed in the training process to obtain the optimal path planning and obstacle avoidance learning scheme for multi-USVs in the determined perceptual environment. Finally, a formation geometry model is established to describe the physical relationships among USVs, and a composite reward function is proposed to guide the training. Numerous simulation tests are conducted, and the effectiveness of the proposed algorithm are further validated through the NEU-MSV01 experimental platform with a combination of parameterized Line of Sight (LOS) guidance. •A goal-based hierarchical reinforcement learning algorithm is proposed for multi-USVs formation path planning, with two levels of division. The upper-level strategy is applied to a virtual navigator responsible for the path planning of the entire formation, while the lower-level strategy maintains the formation of multiple USVs on each individual USV.•The improved APF algorithm is incorporated into the multi-agent deep reinforcement learning training process. At each decision-making moment, the APF output is calculated based on the current state of the USVs, and then integrated into the multi-agent deep reinforcement learning decision-making process.•A formation geometry model is established to describe the physical relationships among USVs, and a composite reward function is proposed for deep reinforcement learning. Multiple formation shapes are designed to address typical real-world tasks for multi-USV systems.
ISSN:0029-8018
1873-5258
DOI:10.1016/j.oceaneng.2023.115577