Action selection neural network training using imitation learning in latent space

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate a...

Full description

Saved in:

Bibliographic Details
Main Authors	Colmenarejo, Sergio Gomez, van den Oord, Aaron Gerard Antonius, Vinyals, Oriol, Aytar, Yusuf, Pfaff, Tobias, Paine, Tom, Reed, Scott Ellison, Novikov, Alexander, Wang, Ziyu, Budden, David
Format	Patent
Language	English
Published	30.05.2023
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network, wherein the action selection policy neural network is configured to process an observation characterizing a state of an environment to generate an action selection policy output, wherein the action selection policy output is used to select an action to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining an observation characterizing a state of the environment subsequent to the agent performing a selected action; generating a latent representation of the observation; processing the latent representation of the observation using a discriminator neural network to generate an imitation score; determining a reward from the imitation score; and adjusting the current values of the action selection policy neural network parameters based on the reward using a reinforcement learning training technique.
Bibliography:	Application Number: US201916586437