A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics,...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 42; no. 6; pp. 1291 - 1307
Main Authors Grondman, I., Busoniu, L., Lopes, G. A. D., Babuska, R.
Format Journal Article
LanguageEnglish
Published New-York, NY IEEE 01.11.2012
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1094-6977
1558-2442
DOI:10.1109/TSMCC.2012.2218595