Competitive Markov decision processes with partial observation
We study a class of Markov decision processes (MDPs) in the infinite time horizon where the number of controllers is two and the observation information is allowed to be imperfect. Suppose the system, space and action space are both finite, and the controllers, having conflicting interests with each...
Saved in:
Published in | 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583) Vol. 1; pp. 236 - 241 vol.1 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
Piscataway NJ
IEEE
2004
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We study a class of Markov decision processes (MDPs) in the infinite time horizon where the number of controllers is two and the observation information is allowed to be imperfect. Suppose the system, space and action space are both finite, and the controllers, having conflicting interests with each other, make decisions independently to seek their own best long-run average profit. Under the hypothesis that at least one system state is perfectly observable and accessible (by each system state no matter what actions are taken), we prove the existence of optimal policies for both controllers and characterize them by the min-max type of dynamic programming equations. An example on a class of machine maintenance process is presented to show our work |
---|---|
ISBN: | 0780385667 9780780385665 |
ISSN: | 1062-922X 2577-1655 |
DOI: | 10.1109/ICSMC.2004.1398303 |