Unpredictable Planning Under Partial Observability
We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the c...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.03.2019
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.1903.07665 |
Cover
Loading…
Summary: | We study the problem of synthesizing a controller that maximizes the entropy
of a partially observable Markov decision process (POMDP) subject to a
constraint on the expected total reward. Such a controller minimizes the
predictability of a decision-maker's trajectories while guaranteeing the
completion of a task expressed by a reward function. First, we prove that a
decision-maker with perfect observations can randomize its paths at least as
well as a decision-maker with partial observations. Then, focusing on
finite-state controllers, we recast the entropy maximization problem as a
so-called parameter synthesis problem for a parametric Markov chain (pMC). We
show that the maximum entropy of a POMDP is lower bounded by the maximum
entropy of this pMC. Finally, we present an algorithm, based on a nonlinear
optimization problem, to synthesize an FSC that locally maximizes the entropy
of a POMDP over FSCs with the same number of memory states. In numerical
examples, we demonstrate the proposed algorithm on motion planning scenarios. |
---|---|
DOI: | 10.48550/arxiv.1903.07665 |