META-GRADIENT UPDATES FOR TRAINING RETURN FUNCTIONS FOR REINFORCEMENT LEARNING SYSTEMS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular,meta-gradient reinforcement learning) to learn an optimum return function G so that the training of t...

Full description

Saved in:

Bibliographic Details
Main Authors	SILVER DAVID, HASSELT HADO PHILIP, XU ZHONGWEN
Format	Patent
Language	Chinese English
Published	29.01.2021
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular,meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters theta more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters theta and a meta-objective function J' is used that is differentiated with respect to the one or more return parameters eta to improve thetraining of the return function G. 用于强化学习的方法、系统和装置，包括编码在计算机存储介质上的计算机程序。本文所描述的实施例应用元学习(特别是元梯度强化学习)来学习最优返回函数G，从而改善系统的训练。这提供了训练强化学习系统的更有效和高效的手段，因为系统能够通过训练返回函数G更快地收敛到一个或多个策略参数θ的最优集。特别地，使返回函数G取决于一个或多个策略参数θ，并且使用相对于一个或多个返回参数η被微分的元目标函
Bibliography:	Application Number: CN201980033531