Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision proce...

Full description

Saved in:

Bibliographic Details
Main Authors	Russel, Reazul Hasan, Benosman, Mouhacine, Van Baar, Jeroen
Format	Journal Article
Language	English
Published	09.10.2020
Subjects	Computer Science - Learning
Online Access	Get full text

Cover

Loading…

Abstract	In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust w.r.t model uncertainty. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm. We finally validate this concept on the inventory management problem.
AbstractList	In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in essence, allows us to design RL algorithms that are robust in performance, and provides constraint satisfaction guarantees, with respect to uncertainties in the system's states transition probabilities. The need for RCMPDs is important for real-life applications of RL. For instance, such formulation can play an important role for policy transfer from simulation to real world (Sim2Real) in safety critical applications, which would benefit from performance and safety guarantees which are robust w.r.t model uncertainty. We first propose the general problem formulation under the concept of RCMDP, and then propose a Lagrangian formulation of the optimal problem, leading to a robust-constrained policy gradient RL algorithm. We finally validate this concept on the inventory management problem.
Author	Russel, Reazul Hasan Benosman, Mouhacine Van Baar, Jeroen
Author_xml	– sequence: 1 givenname: Reazul Hasan surname: Russel fullname: Russel, Reazul Hasan – sequence: 2 givenname: Mouhacine surname: Benosman fullname: Benosman, Mouhacine – sequence: 3 givenname: Jeroen surname: Van Baar fullname: Van Baar, Jeroen
BackLink	https://doi.org/10.48550/arXiv.2010.04870$$DView paper in arXiv
BookMark	eNpNj81OAyEYRVnoQqsP4EpegAozwIA7M_7UpE0brTuTCX5AQjKFhqHG8emttQtXNzm59ybnHJ3EFB1CV4xOuRKC3pj8FT6nFd0DylVDz9D7S_rYDQW3KQ4lmxCdJYv71XCLX5Mv5B_Gx-Yq9QFGvNyWsAnfpoQU8S5al_EiWdfjtwgul_2kjBfo1Jt-cJfHnKD148O6nZH58um5vZsTIxtKABiXmkmtFGfO1wKEdlxYAaay1lElwDFQmktpNDTgtahAUMG9VWC8qSfo-u_2oNdtc9iYPHa_mt1Bs_4Bh0JQmw
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2010.04870
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2010_04870
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a670-cc14691698841ef35c59e45d5ca2dde085ce1c89466a9c7cf952c5054fd8cafa3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:39:52 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a670-cc14691698841ef35c59e45d5ca2dde085ce1c89466a9c7cf952c5054fd8cafa3
OpenAccessLink	https://arxiv.org/abs/2010.04870
ParticipantIDs	arxiv_primary_2010_04870
PublicationCentury	2000
PublicationDate	2020-10-09
PublicationDateYYYYMMDD	2020-10-09
PublicationDate_xml	– month: 10 year: 2020 text: 2020-10-09 day: 09
PublicationDecade	2020
PublicationYear	2020
Score	1.7855644
SecondaryResourceType	preprint
Snippet	In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning
Title	Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty
URI	https://arxiv.org/abs/2010.04870
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NT8MwDI3GTlwQCND4VA5cI9o07RJuCBgT0hiCTeoBqUrcREKCbWo7BP8eJy1iF66OT7ZiP8v2MyEXiZVZLGXJEBsAEzBMmJSOMyFsGXMoEcL6fefJYzaei4c8zXuE_u7C6Orr7bPlBzb1ZTt5hZgai_Itzv3I1v00b5uTgYqr0__TQ4wZRBtJYrRLdjp0R69bd-yRnl3sk9fnpVnXDfW3McNFBluyye1TfUVfMAayDTHtNFuyXjrF7_zR7UlSv-xVUX-67J3O0VOhk998H5DZ6G52M2bdUQOms2HEADA0ISRTUorYuiSFVFmRlilojpEGARDYGKRnfdcKhuBUygFRinClBO10ckj6i-XCDgg1ibIZ_jmXmFhoyLRvKTqjI8UjFRl9RAbBFMWq5a0ovJWKYKXj_59OyDb3JaXvkatT0m-qtT3DvNuY82D8H77Pg_w
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Robust+Constrained-MDPs%3A+Soft-Constrained+Robust+Policy+Optimization+under+Model+Uncertainty&rft.au=Russel%2C+Reazul+Hasan&rft.au=Benosman%2C+Mouhacine&rft.au=Van+Baar%2C+Jeroen&rft.date=2020-10-09&rft_id=info:doi/10.48550%2Farxiv.2010.04870&rft.externalDocID=2010_04870