MODEL-FREE CONTROL FOR REINFORCEMENT LEARNING AGENTS
Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One method includes maintaining return data that maps each observation-action pair to a respective return, the action in each observation-action pair being an action that was performed...
Saved in:
Main Authors | , |
---|---|
Format | Patent |
Language | English French German |
Published |
27.03.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Methods, systems, and apparatus for selecting actions to be performed by an agent interacting with an environment. One method includes maintaining return data that maps each observation-action pair to a respective return, the action in each observation-action pair being an action that was performed by the agent in response to the observation in the observation-action pair and the respective return mapped to by each of the observation-action pairs being a return that resulted from the agent performing the action in the observation-action pair; receiving a current observation; determining whether the current observation matches any observation identified in the return data; and in response to determining that the current observation matches a first observation identified in the return data, selecting an action to be performed by the agent using the returns mapped to by observation-action pairs in the return data that include the first observation. |
---|---|
Bibliography: | Application Number: EP20170727993 |