Unpredictable Planning Under Partial Observability

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the c...

Full description

Saved in:

Bibliographic Details
Main Authors	Hibbard, Michael, Savas, Yagiz, Wu, Bo, Tanaka, Takashi, Topcu, Ufuk
Format	Journal Article
Language	English
Published	18.03.2019
Subjects	Mathematics - Optimization and Control
Online Access	Get full text
DOI	10.48550/arxiv.1903.07665

Cover

Abstract	We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the completion of a task expressed by a reward function. First, we prove that a decision-maker with perfect observations can randomize its paths at least as well as a decision-maker with partial observations. Then, focusing on finite-state controllers, we recast the entropy maximization problem as a so-called parameter synthesis problem for a parametric Markov chain (pMC). We show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of this pMC. Finally, we present an algorithm, based on a nonlinear optimization problem, to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In numerical examples, we demonstrate the proposed algorithm on motion planning scenarios.
AbstractList	We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the completion of a task expressed by a reward function. First, we prove that a decision-maker with perfect observations can randomize its paths at least as well as a decision-maker with partial observations. Then, focusing on finite-state controllers, we recast the entropy maximization problem as a so-called parameter synthesis problem for a parametric Markov chain (pMC). We show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of this pMC. Finally, we present an algorithm, based on a nonlinear optimization problem, to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In numerical examples, we demonstrate the proposed algorithm on motion planning scenarios.
Author	Savas, Yagiz Topcu, Ufuk Hibbard, Michael Wu, Bo Tanaka, Takashi
Author_xml	– sequence: 1 givenname: Michael surname: Hibbard fullname: Hibbard, Michael – sequence: 2 givenname: Yagiz surname: Savas fullname: Savas, Yagiz – sequence: 3 givenname: Bo surname: Wu fullname: Wu, Bo – sequence: 4 givenname: Takashi surname: Tanaka fullname: Tanaka, Takashi – sequence: 5 givenname: Ufuk surname: Topcu fullname: Topcu, Ufuk
BackLink	https://doi.org/10.48550/arXiv.1903.07665$$DView paper in arXiv
BookMark	eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTM7Q0MNYzMDczM-VkMArNKyhKTclMLklMyklVCMhJzMvLzEtXCM1LSS1SCEgsKslMzFHwTypOLSpLTMrMySyp5GFgTUvMKU7lhdLcDPJuriHOHrpg0-MLijJzE4sq40G2xINtMSasAgAJWzO2
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKZ GOX
DOI	10.48550/arxiv.1903.07665
DatabaseName	arXiv Mathematics arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	1903_07665
GroupedDBID	AKZ GOX
ID	FETCH-arxiv_primary_1903_076653
IEDL.DBID	GOX
IngestDate	Wed Jul 23 01:56:27 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_1903_076653
OpenAccessLink	https://arxiv.org/abs/1903.07665
ParticipantIDs	arxiv_primary_1903_07665
PublicationCentury	2000
PublicationDate	2019-03-18
PublicationDateYYYYMMDD	2019-03-18
PublicationDate_xml	– month: 03 year: 2019 text: 2019-03-18 day: 18
PublicationDecade	2010
PublicationYear	2019
Score	3.3684223
SecondaryResourceType	preprint
Snippet	We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Mathematics - Optimization and Control
Title	Unpredictable Planning Under Partial Observability
URI	https://arxiv.org/abs/1903.07665
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSTI3TTM1sjTUNTZNNdY1Mbcw0bUwMTHVNTZONTFJNLYASYNWW_iZeYSaeEWYRjAxKMD2wiQWVWSWQc4HTirWB9ZWxnrAnraZKTMDs5ERqHPl7h8BmZwEH8UFVY9QB2xjgoWQKgk3QQZ-aOtOwRESHUIMTKl5IgxGoXkFRaD5kBLQNiUF2DVBCuArhxQCQHEH1OOfBB4fBa9VrRRlkHdzDXH20AXbEl8AORIiHuSAeLADjMUYWIAd91QJBgUT02QTM2PDRGBZZG5iamCWlGxslGxgmWyUYpBomGpmKMkggcsUKdxS0gxcwErbErQOytBChoGlpKg0VRZYMZYkyYFDBwBmimX1
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unpredictable+Planning+Under+Partial+Observability&rft.au=Hibbard%2C+Michael&rft.au=Savas%2C+Yagiz&rft.au=Wu%2C+Bo&rft.au=Tanaka%2C+Takashi&rft.date=2019-03-18&rft_id=info:doi/10.48550%2Farxiv.1903.07665&rft.externalDocID=1903_07665