Multi-Agent Path Finding Method Based on Evolutionary Reinforcement Learning

The multi-agent path finding (MAPF) problem is crucial to improve the efficiency of warehouse systems. Compared with traditional centralized methods, which encounter escalating computational complexities with increasing scale, reinforcement learning-based methods has been proven to be an effective m...

Full description

Saved in:

Bibliographic Details
Published in	Chinese Control Conference pp. 5728 - 5733
Main Authors	Shi, Qinru, Liu, Meiqin, Zhang, Senlin, Zheng, Ronghao, Lan, Xuguang
Format	Conference Proceeding
Language	English
Published	Technical Committee on Control Theory, Chinese Association of Automation 28.07.2024
Subjects	Collaboration deep learning in robotics and automation evolutionary algorithm Evolutionary computation Learning systems multi-agent path finding Multi-agent systems Reinforcement learning Robotics and automation System recovery Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The multi-agent path finding (MAPF) problem is crucial to improve the efficiency of warehouse systems. Compared with traditional centralized methods, which encounter escalating computational complexities with increasing scale, reinforcement learning-based methods has been proven to be an effective method for solving MAPF problem. Nevertheless, in the complex and large-scale scenarios, the policies learned by existing reinforcement learning-based methods are generally inadequate to address the challenges effectively. By leveraging the concepts of policy evaluation and policy evolution, this paper aims to improve performance and sample efficiency. Consequently, we introduce an MAPF method based on evolutionary reinforcement learning. In particular, we design a collaborative policy network model based on reinforcement learning. Thereafter, a novel evolutionary reinforcement learning training framework is constructed. Through the quantitative evaluation mechanism, policy evaluation is carried out, and evolutionary algorithm is used for policy evolution, so that the collaborative policy could better guide the agent to complete the path finding task. We test on high-density warehouse environment instances of various map sizes, and the experimental results show that our method has high success rate and low average steps.
ISSN:	1934-1768
DOI:	10.23919/CCC63176.2024.10661475