Weakly Coupled Markov Decision Processes with Imperfect Information

Weakly coupled Markov decision processes (MDPs) are stochastic dynamic programs where decisions in independent sub-MDPs are linked via constraints. Their exact solution is computationally intractable. Numerical experiments have shown that Lagrangian relaxation can be an effective approximation techn...

Full description

Saved in:
Bibliographic Details
Published in2019 Winter Simulation Conference (WSC) pp. 3609 - 3602
Main Authors Parizi, Mahshid Salemi, Ghate, Archis
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Weakly coupled Markov decision processes (MDPs) are stochastic dynamic programs where decisions in independent sub-MDPs are linked via constraints. Their exact solution is computationally intractable. Numerical experiments have shown that Lagrangian relaxation can be an effective approximation technique. This paper considers two classes of weakly coupled MDPs with imperfect information. In the first case, the transition probabilities for each sub-MDP are characterized by parameters whose values are unknown. This yields a Bayes-adaptive weakly coupled MDP. In the second case, the decision-maker cannot observe the actual state and instead receives a noisy signal. This yields a weakly coupled partially observable MDP. Computationally tractable approximate dynamic programming methods combining semi-stochastic certainty equivalent control or Thompson sampling with Lagrangian relaxation are proposed. These methods are applied to a class of stochastic dynamic resource allocation problems and to restless multi-armed bandit problems with partially observable states. Insights are drawn from numerical experiments.
ISSN:1558-4305
DOI:10.1109/WSC40007.2019.9004927