Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative m...

Full description

Saved in:
Bibliographic Details
Published inMachine learning Vol. 91; no. 3; pp. 325 - 349
Main Authors Gheshlaghi Azar, Mohammad, Munos, Rémi, Kappen, Hilbert J.
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.06.2013
Springer
Springer Nature B.V
Springer Verlag
Subjects
Online AccessGet full text

Cover

Loading…