STACKED CONVOLUTIONAL LONG SHORT-TERM MEMORY FOR MODEL-FREE REINFORCEMENT LEARNING

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One of the methods includes obtaining a representation of an observation; processing the representation using a convolutional long short-term memo...

Full description

Saved in:
Bibliographic Details
Main Authors GREGOR, Karol, GUEZ, Arthur Clement, KABRA, Rishabh, MIRZA MOHAMMADI, Mehdi
Format Patent
LanguageEnglish
French
German
Published 10.03.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One of the methods includes obtaining a representation of an observation; processing the representation using a convolutional long short-term memory (LSTM) neural network comprising a plurality of convolutional LSTM neural network layers; processing an action selection input comprising the final LSTM hidden state output for the time step using an action selection neural network that is configured to receive the action selection input and to process the action selection input to generate an action selection output that defines an action to be performed by the agent at the time step; selecting, from the action selection output, the action to be performed by the agent at the time step in accordance with an action selection policy; and causing the agent to perform the selected action.
AbstractList Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One of the methods includes obtaining a representation of an observation; processing the representation using a convolutional long short-term memory (LSTM) neural network comprising a plurality of convolutional LSTM neural network layers; processing an action selection input comprising the final LSTM hidden state output for the time step using an action selection neural network that is configured to receive the action selection input and to process the action selection input to generate an action selection output that defines an action to be performed by the agent at the time step; selecting, from the action selection output, the action to be performed by the agent at the time step in accordance with an action selection policy; and causing the agent to perform the selected action.
Author KABRA, Rishabh
MIRZA MOHAMMADI, Mehdi
GUEZ, Arthur Clement
GREGOR, Karol
Author_xml – fullname: GREGOR, Karol
– fullname: GUEZ, Arthur Clement
– fullname: KABRA, Rishabh
– fullname: MIRZA MOHAMMADI, Mehdi
BookMark eNqNyk0KwjAQQOEsdOHfHeYCXUgV6zKkkzaYzMg0FVyVInElbaHeH114AFcPPt5aLYZxSCslTdTmgiUYphv7Njom7cEzVdDULDGLKAECBpY7WBYIXKLPrCCCoKMvGQxIETxqIUfVVi2f_WtOu183CixGU2dpGrs0T_0jDend4TU_FcXxcNb7_I_lA9X8MWk
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
DocumentTitleAlternate GESTAPELTER GEFALTETER LANG-/KURZZEITSPEICHER FÜR MODELLFREIES VERSTÄRKUNGSLERNEN
MÉMOIRE À LONG TERME ET À COURT TERME À CONVOLUTION EMPILÉE POUR APPRENTISSAGE PAR RENFORCEMENT SANS MODÈLE
ExternalDocumentID EP3788549A1
GroupedDBID EVB
ID FETCH-epo_espacenet_EP3788549A13
IEDL.DBID EVB
IngestDate Fri Oct 25 05:39:55 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
French
German
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_EP3788549A13
Notes Application Number: EP20190782532
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210310&DB=EPODOC&CC=EP&NR=3788549A1
ParticipantIDs epo_espacenet_EP3788549A1
PublicationCentury 2000
PublicationDate 20210310
PublicationDateYYYYMMDD 2021-03-10
PublicationDate_xml – month: 03
  year: 2021
  text: 20210310
  day: 10
PublicationDecade 2020
PublicationYear 2021
RelatedCompanies DeepMind Technologies Limited
RelatedCompanies_xml – name: DeepMind Technologies Limited
Score 3.3177273
Snippet Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling an agent interacting with an environment. One...
SourceID epo
SourceType Open Access Repository
SubjectTerms CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
PHYSICS
Title STACKED CONVOLUTIONAL LONG SHORT-TERM MEMORY FOR MODEL-FREE REINFORCEMENT LEARNING
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210310&DB=EPODOC&locale=&CC=EP&NR=3788549A1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb8IwDLYQe942tmnspRym3qLxCJQe0ARpCmxtg0pB7IRoSSUuBY1O-_tzM2C7bLfIkaLEimN_ifMZ4LHK4qZasCrNK1lT9FBz2rIUoyazWo2FqsVWotk-_WZ_zF6mjWkBlru_MJon9FOTI6JFxWjvmT6v1z-XWLbOrdw8RUsUrZ6dsG0bW3Rcy4sWVAy72xZDaUtucI4tww_aOW06QqEOAqUDjKLN3BjEpJt_Sln_9ijOGRwOcbA0O4eCSktwwneF10pw7G3fu0twpBM04w0Kt0a4uYBgFHb4q7AJl_5EuuNvOlviSr9HRn0ZhBRDVI94wpPBG0GURzxpC5c6gRAkEAMfRVyT-BNXdAJ_4PcugTgi5H2K05ztVTITw_2C6ldQTFepugaSRJUoxgAEd0fMEjOyzIgtWD1RjaqZIAouQ_nPYW7-6buF01y3VKex3UExe_9Q9-iLs-hBa_ELNzqHPA
link.rule.ids 230,309,783,888,25576,76876
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8Q_MA3RY342Qezt0Y-CmMPxEDXMWTryBgEn4gbW8LLIDLjv--tAvqib801adpLr3e_9vo7gMcai1rxgtVoXsmaood6o20jZlRnRru5iOuRkSi2T9myJ-xl1pwVYLn7C6N4Qj8VOSJaVIT2nqnzev1ziWWq3MrNU7hE0erZCjqmtkXH9bxoQVUzex0x8kyPa5xjS5N-J6dNRyjURaB0gBG2nhuDmPbyTynr3x7FOoXDEQ6WZmdQiNMylPiu8FoZjt3te3cZjlSCZrRB4dYIN-fgj4MuHwqTcE9OPWfyTWdLHE_2ydj2_IBiiOoSV7ie_0oQ5RHXM4VDLV8I4ouBRBFXJP7EEV1fDmT_AoglAm5TnOZ8r5K5GO0X1LiEYrpK4ysgSVgNIwxAcHdELNFDQw_ZgjWSuFnTE0TBFaj8Ocz1P30PULID15k7Azm8gZNcz1SltN1CMXv_iO_QL2fhvdLoF7u8ii8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=STACKED+CONVOLUTIONAL+LONG+SHORT-TERM+MEMORY+FOR+MODEL-FREE+REINFORCEMENT+LEARNING&rft.inventor=GREGOR%2C+Karol&rft.inventor=GUEZ%2C+Arthur+Clement&rft.inventor=KABRA%2C+Rishabh&rft.inventor=MIRZA+MOHAMMADI%2C+Mehdi&rft.date=2021-03-10&rft.externalDBID=A1&rft.externalDocID=EP3788549A1