MODEL-FREE CONTROL FOR REINFORCEMENT LEARNING AGENTS

Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One method includes maintaining return data that maps each observation-action pair to a respective return, the action in each observation-action pair being an action that was performed...

Full description

Saved in:
Bibliographic Details
Main Authors URIA-MARTINEZ, Benigno, BLUNDELL, Charles
Format Patent
LanguageEnglish
French
German
Published 27.03.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One method includes maintaining return data that maps each observation-action pair to a respective return, the action in each observation-action pair being an action that was performed by the agent in response to the observation in the observation-action pair and the respective return mapped to by each of the observation-action pairs being a return that resulted from the agent performing the action in the observation-action pair; receiving a current observation; determining whether the current observation matches any observation identified in the return data; and in response to determining that the current observation matches a first observation identified in the return data, selecting an action to be performed by the agent using the returns mapped to by observation-action pairs in the return data that include the first observation.
Bibliography:Application Number: EP20170727993