Robust Reinforcement Learning

This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference...

Full description

Saved in:

Bibliographic Details
Published in	Neural computation Vol. 17; no. 2; pp. 335 - 359
Main Authors	Morimoto, Jun, Doya, Kenji
Format	Journal Article
Language	English
Published	One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.02.2005 MIT Press Journals, The
Subjects	Algorithms Applied sciences Artificial intelligence Calculus of variations and optimal control Computer science; control theory; systems Control theory. Systems Exact sciences and technology Learning Learning and adaptive systems Letters Mathematical analysis Mathematics Neural networks Neural Networks (Computer) Optimal control Reinforcement (Psychology) Sciences and techniques of general use Simulation Inversed pendulum Neural computation Error estimation Gaussian network Reinforcement learning H infinite control Online algorithm Neural network Modeling Value function Estimating function Friction Differential game Planning Learning algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of control, we consider a differential game in which a “disturbing” agent tries to make the worst possible disturbance while a “control” agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Bibliography:	February, 2005 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0899-7667 1530-888X
DOI:	10.1162/0899766053011528