Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed as...

Full description

Saved in:

Bibliographic Details
Published in	SIAM journal on control and optimization Vol. 38; no. 1; pp. 94 - 123
Main Authors	Konda, Vijaymohan R., Borkar, Vivek S.
Format	Journal Article
Language	English
Published	Philadelphia Society for Industrial and Applied Mathematics 1999
Subjects	Algorithms Approximation Markov analysis Simulation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.
ISSN:	0363-0129 1095-7138
DOI:	10.1137/S036301299731669X