META-GRADIENT UPDATES FOR TRAINING RETURN FUNCTIONS FOR REINFORCEMENT LEARNING SYSTEMS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular,meta-gradient reinforcement learning) to learn an optimum return function G so that the training of t...

Full description

Saved in:
Bibliographic Details
Main Authors SILVER DAVID, HASSELT HADO PHILIP, XU ZHONGWEN
Format Patent
LanguageChinese
English
Published 29.01.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular,meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters theta more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters theta and a meta-objective function J' is used that is differentiated with respect to the one or more return parameters eta to improve thetraining of the return function G. 用于强化学习的方法、系统和装置,包括编码在计算机存储介质上的计算机程序。本文所描述的实施例应用元学习(特别是元梯度强化学习)来学习最优返回函数G,从而改善系统的训练。这提供了训练强化学习系统的更有效和高效的手段,因为系统能够通过训练返回函数G更快地收敛到一个或多个策略参数θ的最优集。特别地,使返回函数G取决于一个或多个策略参数θ,并且使用相对于一个或多个返回参数η被微分的元目标函
Bibliography:Application Number: CN201980033531