Distributed training using dissimilar policy actor-evaluator reinforcement learning

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network for selecting an action to be performed by an agent interacting with an environment. In one aspect, a system includes a plurality of actor computing unit...

Full description

Saved in:
Bibliographic Details
Main Authors MINY VALERIE, WARD TRACY, FIROI VINCENZO, DORON, YEHUDA, KAVAKULOGLU, KATERINA, DUNNING IAN, SAWYER HOWARD J, ESPEHOLT LASSE, HARLEY THOMAS J. A, SIMONYAN KAREN, MUNOZ RODOLFO
Format Patent
LanguageChinese
English
Published 18.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network for selecting an action to be performed by an agent interacting with an environment. In one aspect, a system includes a plurality of actor computing units and a plurality of learner computing units. An actor calculation unit generates an experience tuple trajectory, and a learner calculation unit uses the experience tuple trajectory to update learner action selection neural network parameters with a reinforcement learning technique. The reinforcement learning technique may be a different policy actor-evaluator reinforcement learning technique. 方法、系统和装置,包括编码在计算机存储介质上的计算机程序,用于训练动作选择神经网络,该动作选择神经网络用于选择要由与环境交互的智能体执行的动作。在一个方面,一种系统包括多个行动者计算单元和多个学习者计算单元。行动者计算单元生成经验元组轨迹,学习者计算单元使用该经验元组轨迹来利用强化学习技术更新学习者动作选择神经网络参数。强化学习技术可以是异策略行动者-评价者强化学习技术。
Bibliography:Application Number: CN202410384665