Path planning for multiple agents in an unknown environment using soft actor critic and curriculum learning

Path planning can guarantee that agents reach their goals without colliding with obstacles and other agents in an optimal way and it is a very important component in the research of crowd simulation. In this article, we propose a novel path planning approach for multiple agents which combines soft a...

Full description

Saved in:

Bibliographic Details
Published in	Computer animation and virtual worlds Vol. 34; no. 1
Main Authors	Sun, Libo, Yan, Jiahui, Qin, Wenhu
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 01.01.2023 Wiley Subscription Services, Inc
Subjects	Algorithms Barriers Curricula curriculum learning Machine learning Multiagent systems multiple agents Neural networks Path planning Planning soft actor critic Unknown environments
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Path planning can guarantee that agents reach their goals without colliding with obstacles and other agents in an optimal way and it is a very important component in the research of crowd simulation. In this article, we propose a novel path planning approach for multiple agents which combines soft actor critic (SAC) algorithm and curriculum learning to solve the problems of single policy, slow convergence of the policy in an unknown environment with sparse rewards. The path planning task is set as lessons from easy to difficult, and the neural network of the SAC algorithm is arranged to learn in sequence, and finally the neural network can be fully competent for the path planning task. We also stack the state information to address the problems caused by limited observation for policy learning, and design a comprehensive reward function to make agents reach their goals successfully and avoid collisions with static obstacles and other agents. The experimental results demonstrate that our approach can plan smooth and natural paths for multiple agents, and furthermore, our model has a certain generalization ability and a better adaptability to the changes in a dynamic environment. In the circle scenario and the scenario with some static obstacles, the model using our approach shows a good generalization performance without policy retraining. In the emergency scenario, the model using our approach can adapt to a new change in a dynamic environment without policy retraining.
Bibliography:	Funding information Jiangsu Modern Agricultural Industry Key Technology Innovation Project, Grant/Award Number: CX(20)2013; National Key Research and Development Program, Grant/Award Number: 2020YFB160070301; The Key R&D Program of Jiangsu Province, Grant/Award Number: BE2019311 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1546-4261 1546-427X
DOI:	10.1002/cav.2113