Finite- Time Analysis of Asynchronous Multi-Agent TD Learning

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent re-inforcement learning (MARL). In a setting involving N agents, this beneficial effect usually comes in the form of an N -fold linear convergence speedup, i.e., a reduction - proportional to N -...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the American Control Conference pp. 2090 - 2097
Main Authors Dal Fabbro, Nicolo, Adibi, Arman, Mitra, Aritra, Pappas, George J.
Format Conference Proceeding
LanguageEnglish
Published AACC 10.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent re-inforcement learning (MARL). In a setting involving N agents, this beneficial effect usually comes in the form of an N -fold linear convergence speedup, i.e., a reduction - proportional to N - in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of AsyncMATD, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of AsyncMATD, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate.
ISSN:2378-5861
DOI:10.23919/ACC60939.2024.10644447