Perception-Aware Based UAV Trajectory Planner via Generative Adversarial Self-Imitation Learning From Demonstrations

The use of unmanned aerial vehicles (UAVs) for Internet of Things applications, like intelligent monitoring and search, is increasingly becoming a popular research focus globally. While various optimization algorithms exist to plan UAV flight paths, they frequently compromise the quality of the plan...

Full description

Saved in:
Bibliographic Details
Published inIEEE internet of things journal p. 1
Main Authors Zhang, Hanxuan, Huo, Ju, Huang, Yulong, Cheng, Jiajun, Li, Xiaofeng
Format Journal Article
LanguageEnglish
Published IEEE 08.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The use of unmanned aerial vehicles (UAVs) for Internet of Things applications, like intelligent monitoring and search, is increasingly becoming a popular research focus globally. While various optimization algorithms exist to plan UAV flight paths, they frequently compromise the quality of the planning path to decrease planning time. In view of the above problems, a perception-aware based UAV trajectory planner via generative adversarial self-imitation learning from demonstration is proposed. Firstly, a progressively growing discriminator is devised to prevent the policy network from being overpowered in early training stages, avoiding potential training failures. Secondly, the issue of homogenized strategic patterns among optimized expert trajectories is solved by incorporating successful trajectories from the policy network into the expert buffer, which thereby enhances the network's generalization capabilities. Thirdly, to address the challenges of skewed distribution and considerable performance variation among the strategies learned by the policy network during training, a class-level instance-balancing expert buffer is introduced. Finally, the yaw angle of the UAV in real time during flight is obtained by using the analytical solution of the position trajectory and yaw angle and the position trajectory output from the policy network. Experiments confirm our proposed method achieves comparable flight costs and success rates to those of the reference expert method, while the planning time is reduced. The proposed method is also shown to be well adapted to dynamic environments and obstacle trajectories, which are not involved in training. Additionally, the ablation studies highlight the individual contributions of each component within the proposed method.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2024.3477450