Learning Locally, Communicating Globally: Reinforcement Learning of Multi-robot Task Allocation for Cooperative Transport

We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios...

Full description

Saved in:
Bibliographic Details
Published inIFAC-PapersOnLine Vol. 56; no. 2; pp. 11436 - 11443
Main Authors Shibata, Kazuki, Jimbo, Tomohiko, Odashima, Tadashi, Takeshita, Keisuke, Matsubara, Takamitsu
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We consider task allocation for multi-object transport using a multi-robot system, in which each robot selects one object among multiple objects with different and unknown weights. The existing centralized methods assume the number of robots and tasks to be fixed, which is inapplicable to scenarios that differ from the learning environment. Meanwhile, the existing distributed methods limit the minimum number of robots and tasks to a constant value, making them applicable to various numbers of robots and tasks. However, they cannot transport an object whose weight exceeds the load capacity of robots observing the object. To make it applicable to various numbers of robots and objects with different and unknown weights, we propose a framework using multi-agent reinforcement learning for task allocation. First, we introduce a structured policy model consisting of 1) predesigned dynamic task priorities with global communication and 2) a neural network-based distributed policy model that determines the timing for coordination. The distributed policy builds consensus on the high-priority object under local observations and selects cooperative or independent actions. Then, the policy is optimized by multi-agent reinforcement learning through trial and error. This structured policy of local learning and global communication makes our framework applicable to various numbers of robots and objects with different and unknown weights, as demonstrated by simulations.
ISSN:2405-8963
2405-8963
DOI:10.1016/j.ifacol.2023.10.431