META-GRADIENT UPDATES FOR TRAINING RETURN FUNCTIONS FOR REINFORCEMENT LEARNING SYSTEMS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular,meta-gradient reinforcement learning) to learn an optimum return function G so that the training of t...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
29.01.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning. The embodiments described herein apply meta-learning (and in particular,meta-gradient reinforcement learning) to learn an optimum return function G so that the training of the system is improved. This provides a more effective and efficient means of training a reinforcement learning system as the system is able to converge on an optimum set of one or more policy parameters theta more quickly by training the return function G as it goes. In particular, the return function G is made dependent on the one or more policy parameters theta and a meta-objective function J' is used that is differentiated with respect to the one or more return parameters eta to improve thetraining of the return function G.
用于强化学习的方法、系统和装置,包括编码在计算机存储介质上的计算机程序。本文所描述的实施例应用元学习(特别是元梯度强化学习)来学习最优返回函数G,从而改善系统的训练。这提供了训练强化学习系统的更有效和高效的手段,因为系统能够通过训练返回函数G更快地收敛到一个或多个策略参数θ的最优集。特别地,使返回函数G取决于一个或多个策略参数θ,并且使用相对于一个或多个返回参数η被微分的元目标函 |
---|---|
Bibliography: | Application Number: CN201980033531 |