META-GRADIENT UPDATES FOR TRAINING RETURN FUNCTIONS FOR REINFORCEMENT LEARNING SYSTEMS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum return function G so that the training of...

Full description

Saved in:

Bibliographic Details
Main Authors	Silver, David, Xu, Zhongwen, van Hasselt, Hado Philip
Format	Patent
Language	English
Published	25.03.2021
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular, meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters θ more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters θ and a meta-objective function J′ is used that is differentiated with respect to the one or more return parameters η to improve the training of the return function G.
Bibliography:	Application Number: US202017112220