Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision proce...

Full description

Saved in:
Bibliographic Details
Main Authors Russel, Reazul Hasan, Benosman, Mouhacine, Van Baar, Jeroen
Format Journal Article
LanguageEnglish
Published 09.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust w.r.t model uncertainty. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm. We finally validate this concept on the inventory management problem.
AbstractList In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust w.r.t model uncertainty. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm. We finally validate this concept on the inventory management problem.
Author Russel, Reazul Hasan
Benosman, Mouhacine
Van Baar, Jeroen
Author_xml – sequence: 1
  givenname: Reazul Hasan
  surname: Russel
  fullname: Russel, Reazul Hasan
– sequence: 2
  givenname: Mouhacine
  surname: Benosman
  fullname: Benosman, Mouhacine
– sequence: 3
  givenname: Jeroen
  surname: Van Baar
  fullname: Van Baar, Jeroen
BackLink https://doi.org/10.48550/arXiv.2010.04870$$DView paper in arXiv
BookMark eNpNj81OAyEYRVnoQqsP4EpegAozwIA7M_7UpE0brTuTCX5AQjKFhqHG8emttQtXNzm59ybnHJ3EFB1CV4xOuRKC3pj8FT6nFd0DylVDz9D7S_rYDQW3KQ4lmxCdJYv71XCLX5Mv5B_Gx-Yq9QFGvNyWsAnfpoQU8S5al_EiWdfjtwgul_2kjBfo1Jt-cJfHnKD148O6nZH58um5vZsTIxtKABiXmkmtFGfO1wKEdlxYAaay1lElwDFQmktpNDTgtahAUMG9VWC8qSfo-u_2oNdtc9iYPHa_mt1Bs_4Bh0JQmw
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2010.04870
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2010_04870
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a670-cc14691698841ef35c59e45d5ca2dde085ce1c89466a9c7cf952c5054fd8cafa3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:39:52 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a670-cc14691698841ef35c59e45d5ca2dde085ce1c89466a9c7cf952c5054fd8cafa3
OpenAccessLink https://arxiv.org/abs/2010.04870
ParticipantIDs arxiv_primary_2010_04870
PublicationCentury 2000
PublicationDate 2020-10-09
PublicationDateYYYYMMDD 2020-10-09
PublicationDate_xml – month: 10
  year: 2020
  text: 2020-10-09
  day: 09
PublicationDecade 2020
PublicationYear 2020
Score 1.7855644
SecondaryResourceType preprint
Snippet In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Title Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty
URI https://arxiv.org/abs/2010.04870
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NT8MwDI3GTlwQCND4VA5cI9o07RJuCBgT0hiCTeoBqUrcREKCbWo7BP8eJy1iF66OT7ZiP8v2MyEXiZVZLGXJEBsAEzBMmJSOMyFsGXMoEcL6fefJYzaei4c8zXuE_u7C6Orr7bPlBzb1ZTt5hZgai_Itzv3I1v00b5uTgYqr0__TQ4wZRBtJYrRLdjp0R69bd-yRnl3sk9fnpVnXDfW3McNFBluyye1TfUVfMAayDTHtNFuyXjrF7_zR7UlSv-xVUX-67J3O0VOhk998H5DZ6G52M2bdUQOms2HEADA0ISRTUorYuiSFVFmRlilojpEGARDYGKRnfdcKhuBUygFRinClBO10ckj6i-XCDgg1ibIZ_jmXmFhoyLRvKTqjI8UjFRl9RAbBFMWq5a0ovJWKYKXj_59OyDb3JaXvkatT0m-qtT3DvNuY82D8H77Pg_w
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Robust+Constrained-MDPs%3A+Soft-Constrained+Robust+Policy+Optimization+under+Model+Uncertainty&rft.au=Russel%2C+Reazul+Hasan&rft.au=Benosman%2C+Mouhacine&rft.au=Van+Baar%2C+Jeroen&rft.date=2020-10-09&rft_id=info:doi/10.48550%2Farxiv.2010.04870&rft.externalDocID=2010_04870