A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 42; no. 6; pp. 1291 - 1307
Main Authors	Grondman, I., Busoniu, L., Lopes, G. A. D., Babuska, R.
Format	Journal Article
Language	English
Published	New-York, NY IEEE 01.11.2012 Institute of Electrical and Electronics Engineers
Subjects	Actor-critic Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Approximation algorithms Approximation methods Artificial intelligence Automatic Automatic Control Engineering Computer Science Computer science; control theory; systems Control theory. Systems Convergence Engineering Sciences Equations Estimates Exact sciences and technology Learning Machine Learning natural gradient Optimization Policies policy gradient Power control Reinforcement reinforcement learning (RL) Robotics Searching Theoretical computing Concept learning Action Gradient Finance Reinforcement learning Optimal policy policy gradient Algorithmics Actor-critic State space State space method Function approximation Robotics Variance reinforcement learning (RL) Search algorithm Gradient descent natural gradient Biomimetics Power control Learning algorithm Artificial intelligence actor-critic reinforcement learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1094-6977 1558-2442
DOI:	10.1109/TSMCC.2012.2218595