Finite- Time Analysis of Asynchronous Multi-Agent TD Learning
Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent re-inforcement learning (MARL). In a setting involving N agents, this beneficial effect usually comes in the form of an N -fold linear convergence speedup, i.e., a reduction - proportional to N -...
Saved in:
Published in | Proceedings of the American Control Conference pp. 2090 - 2097 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
AACC
10.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent re-inforcement learning (MARL). In a setting involving N agents, this beneficial effect usually comes in the form of an N -fold linear convergence speedup, i.e., a reduction - proportional to N - in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of AsyncMATD, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of AsyncMATD, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate. |
---|---|
ISSN: | 2378-5861 |
DOI: | 10.23919/ACC60939.2024.10644447 |