Path planning for multiple agents in an unknown environment using soft actor critic and curriculum learning
Path planning can guarantee that agents reach their goals without colliding with obstacles and other agents in an optimal way and it is a very important component in the research of crowd simulation. In this article, we propose a novel path planning approach for multiple agents which combines soft a...
Saved in:
Published in | Computer animation and virtual worlds Vol. 34; no. 1 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.01.2023
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Path planning can guarantee that agents reach their goals without colliding with obstacles and other agents in an optimal way and it is a very important component in the research of crowd simulation. In this article, we propose a novel path planning approach for multiple agents which combines soft actor critic (SAC) algorithm and curriculum learning to solve the problems of single policy, slow convergence of the policy in an unknown environment with sparse rewards. The path planning task is set as lessons from easy to difficult, and the neural network of the SAC algorithm is arranged to learn in sequence, and finally the neural network can be fully competent for the path planning task. We also stack the state information to address the problems caused by limited observation for policy learning, and design a comprehensive reward function to make agents reach their goals successfully and avoid collisions with static obstacles and other agents. The experimental results demonstrate that our approach can plan smooth and natural paths for multiple agents, and furthermore, our model has a certain generalization ability and a better adaptability to the changes in a dynamic environment.
In the circle scenario and the scenario with some static obstacles, the model using our approach shows a good generalization performance without policy retraining. In the emergency scenario, the model using our approach can adapt to a new change in a dynamic environment without policy retraining. |
---|---|
Bibliography: | Funding information Jiangsu Modern Agricultural Industry Key Technology Innovation Project, Grant/Award Number: CX(20)2013; National Key Research and Development Program, Grant/Award Number: 2020YFB160070301; The Key R&D Program of Jiangsu Province, Grant/Award Number: BE2019311 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1546-4261 1546-427X |
DOI: | 10.1002/cav.2113 |