Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative m...
Saved in:
Published in | Machine learning Vol. 91; no. 3; pp. 325 - 349 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Boston
Springer US
01.06.2013
Springer Springer Nature B.V Springer Verlag |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!