Finite- Time Analysis of Asynchronous Multi-Agent TD Learning

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent re-inforcement learning (MARL). In a setting involving N agents, this beneficial effect usually comes in the form of an N -fold linear convergence speedup, i.e., a reduction - proportional to N -...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the American Control Conference pp. 2090 - 2097
Main Authors	Dal Fabbro, Nicolo, Adibi, Arman, Mitra, Aritra, Pappas, George J.
Format	Conference Proceeding
Language	English
Published	AACC 10.07.2024
Subjects	Convergence Delays Temporal difference learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent re-inforcement learning (MARL). In a setting involving N agents, this beneficial effect usually comes in the form of an N -fold linear convergence speedup, i.e., a reduction - proportional to N - in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of AsyncMATD, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of AsyncMATD, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate.
ISSN:	2378-5861
DOI:	10.23919/ACC60939.2024.10644447