Unpredictable Planning Under Partial Observability

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the c...

Full description

Saved in:

Bibliographic Details
Main Authors	Hibbard, Michael, Savas, Yagiz, Wu, Bo, Tanaka, Takashi, Topcu, Ufuk
Format	Journal Article
Language	English
Published	18.03.2019
Subjects	Mathematics - Optimization and Control
Online Access	Get full text
DOI	10.48550/arxiv.1903.07665

Cover

Loading…

More Information
Summary:	We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the completion of a task expressed by a reward function. First, we prove that a decision-maker with perfect observations can randomize its paths at least as well as a decision-maker with partial observations. Then, focusing on finite-state controllers, we recast the entropy maximization problem as a so-called parameter synthesis problem for a parametric Markov chain (pMC). We show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of this pMC. Finally, we present an algorithm, based on a nonlinear optimization problem, to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In numerical examples, we demonstrate the proposed algorithm on motion planning scenarios.
DOI:	10.48550/arxiv.1903.07665