META-GRADIENT UPDATES FOR TRAINING RETURN FUNCTIONS FOR REINFORCEMENT LEARNING SYSTEMS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum return function G so that the training of...

Full description

Saved in:
Bibliographic Details
Main Authors Silver, David, Xu, Zhongwen, van Hasselt, Hado Philip
Format Patent
LanguageEnglish
Published 25.03.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters θ more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters θ and a meta-objective function J′ is used that is differentiated with respect to the one or more return parameters η to improve the training of the return function G.
Bibliography:Application Number: US202017112220