Unpredictable Planning Under Partial Observability
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the c...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.03.2019
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.1903.07665 |
Cover
Abstract | We study the problem of synthesizing a controller that maximizes the entropy
of a partially observable Markov decision process (POMDP) subject to a
constraint on the expected total reward. Such a controller minimizes the
predictability of a decision-maker's trajectories while guaranteeing the
completion of a task expressed by a reward function. First, we prove that a
decision-maker with perfect observations can randomize its paths at least as
well as a decision-maker with partial observations. Then, focusing on
finite-state controllers, we recast the entropy maximization problem as a
so-called parameter synthesis problem for a parametric Markov chain (pMC). We
show that the maximum entropy of a POMDP is lower bounded by the maximum
entropy of this pMC. Finally, we present an algorithm, based on a nonlinear
optimization problem, to synthesize an FSC that locally maximizes the entropy
of a POMDP over FSCs with the same number of memory states. In numerical
examples, we demonstrate the proposed algorithm on motion planning scenarios. |
---|---|
AbstractList | We study the problem of synthesizing a controller that maximizes the entropy
of a partially observable Markov decision process (POMDP) subject to a
constraint on the expected total reward. Such a controller minimizes the
predictability of a decision-maker's trajectories while guaranteeing the
completion of a task expressed by a reward function. First, we prove that a
decision-maker with perfect observations can randomize its paths at least as
well as a decision-maker with partial observations. Then, focusing on
finite-state controllers, we recast the entropy maximization problem as a
so-called parameter synthesis problem for a parametric Markov chain (pMC). We
show that the maximum entropy of a POMDP is lower bounded by the maximum
entropy of this pMC. Finally, we present an algorithm, based on a nonlinear
optimization problem, to synthesize an FSC that locally maximizes the entropy
of a POMDP over FSCs with the same number of memory states. In numerical
examples, we demonstrate the proposed algorithm on motion planning scenarios. |
Author | Savas, Yagiz Topcu, Ufuk Hibbard, Michael Wu, Bo Tanaka, Takashi |
Author_xml | – sequence: 1 givenname: Michael surname: Hibbard fullname: Hibbard, Michael – sequence: 2 givenname: Yagiz surname: Savas fullname: Savas, Yagiz – sequence: 3 givenname: Bo surname: Wu fullname: Wu, Bo – sequence: 4 givenname: Takashi surname: Tanaka fullname: Tanaka, Takashi – sequence: 5 givenname: Ufuk surname: Topcu fullname: Topcu, Ufuk |
BackLink | https://doi.org/10.48550/arXiv.1903.07665$$DView paper in arXiv |
BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTM7Q0MNYzMDczM-VkMArNKyhKTclMLklMyklVCMhJzMvLzEtXCM1LSS1SCEgsKslMzFHwTypOLSpLTMrMySyp5GFgTUvMKU7lhdLcDPJuriHOHrpg0-MLijJzE4sq40G2xINtMSasAgAJWzO2 |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKZ GOX |
DOI | 10.48550/arxiv.1903.07665 |
DatabaseName | arXiv Mathematics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1903_07665 |
GroupedDBID | AKZ GOX |
ID | FETCH-arxiv_primary_1903_076653 |
IEDL.DBID | GOX |
IngestDate | Wed Jul 23 01:56:27 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_1903_076653 |
OpenAccessLink | https://arxiv.org/abs/1903.07665 |
ParticipantIDs | arxiv_primary_1903_07665 |
PublicationCentury | 2000 |
PublicationDate | 2019-03-18 |
PublicationDateYYYYMMDD | 2019-03-18 |
PublicationDate_xml | – month: 03 year: 2019 text: 2019-03-18 day: 18 |
PublicationDecade | 2010 |
PublicationYear | 2019 |
Score | 3.3684223 |
SecondaryResourceType | preprint |
Snippet | We study the problem of synthesizing a controller that maximizes the entropy
of a partially observable Markov decision process (POMDP) subject to a
constraint... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Mathematics - Optimization and Control |
Title | Unpredictable Planning Under Partial Observability |
URI | https://arxiv.org/abs/1903.07665 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSTI3TTM1sjTUNTZNNdY1Mbcw0bUwMTHVNTZONTFJNLYASYNWW_iZeYSaeEWYRjAxKMD2wiQWVWSWQc4HTirWB9ZWxnrAnraZKTMDs5ERqHPl7h8BmZwEH8UFVY9QB2xjgoWQKgk3QQZ-aOtOwRESHUIMTKl5IgxGoXkFRaD5kBLQNiUF2DVBCuArhxQCQHEH1OOfBB4fBa9VrRRlkHdzDXH20AXbEl8AORIiHuSAeLADjMUYWIAd91QJBgUT02QTM2PDRGBZZG5iamCWlGxslGxgmWyUYpBomGpmKMkggcsUKdxS0gxcwErbErQOytBChoGlpKg0VRZYMZYkyYFDBwBmimX1 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unpredictable+Planning+Under+Partial+Observability&rft.au=Hibbard%2C+Michael&rft.au=Savas%2C+Yagiz&rft.au=Wu%2C+Bo&rft.au=Tanaka%2C+Takashi&rft.date=2019-03-18&rft_id=info:doi/10.48550%2Farxiv.1903.07665&rft.externalDocID=1903_07665 |