Percentile performance criteria for limiting average Markov decision processes
Addresses the following basic feasibility problem for infinite-horizon Markov decision processes (MDPs): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing t...
Saved in:
Published in | IEEE transactions on automatic control Vol. 40; no. 1; pp. 2 - 10 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York, NY
IEEE
01.01.1995
Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Addresses the following basic feasibility problem for infinite-horizon Markov decision processes (MDPs): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. The authors present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. The authors also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next the authors consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.< > |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 0018-9286 1558-2523 |
DOI: | 10.1109/9.362904 |