Actor-Critic--Type Learning Algorithms for Markov Decision Processes
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed as...
Saved in:
Published in | SIAM journal on control and optimization Vol. 38; no. 1; pp. 94 - 123 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Philadelphia
Society for Industrial and Applied Mathematics
1999
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations. |
---|---|
ISSN: | 0363-0129 1095-7138 |
DOI: | 10.1137/S036301299731669X |