Percentile performance criteria for limiting average Markov decision processes

Addresses the following basic feasibility problem for infinite-horizon Markov decision processes (MDPs): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing t...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on automatic control Vol. 40; no. 1; pp. 2 - 10
Main Authors Filar, J.A., Krass, D., Ross, K.W.
Format Journal Article
LanguageEnglish
Published New York, NY IEEE 01.01.1995
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Addresses the following basic feasibility problem for infinite-horizon Markov decision processes (MDPs): can a policy be found that achieves a specified value (target) of the long-run limiting average reward at a specified probability level (percentile)? Related optimization problems of maximizing the target for a specified percentile and vice versa are also considered. The authors present a complete (and discrete) classification of both the maximal achievable target levels and of their corresponding percentiles. The authors also provide an algorithm for computing a deterministic policy corresponding to any feasible target-percentile pair. Next the authors consider similar problems for an MDP with multiple rewards and/or constraints. This case presents some difficulties and leads to several open problems. An LP-based formulation provides constructive solutions for most cases.< >
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0018-9286
1558-2523
DOI:10.1109/9.362904