Semi-Markov decision problems and performance sensitivity analysis

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and the perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov pr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automatic control Vol. 48; no. 5; pp. 758 - 769
Main Author	CAO, Xi-Ren
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.05.2003 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Applied sciences Computer science; control theory; systems Control system analysis Control theory. Systems Decision analysis Dynamical systems Dynamics Exact sciences and technology Learning Markov analysis Markov processes Mathematical models Miscellaneous Optimization Performance analysis Poisson equations Queueing analysis Sensitivity analysis State estimation Statistical methods Statistics Stochastic processes Studies User-generated content Markov process Markov decision Reinforcement learning Poisson equation Discrete system Performance analysis Dynamical system Lyapunov equation Discrete event system
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and the perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop the PA theory for semi-Markov processes (SMPs); and then we extend the aforementioned results about the relation among PA, MDP, and RL to SMPs. In particular, we show that performance sensitivity formulas and policy iteration algorithms of semi-Markov decision processes can be derived based on the performance potential and realization matrix. Both the long-run average and discounted-cost problems are considered. This approach provides a unified framework for both problems, and the long-run average problem corresponds to the discounted factor being zero. The results indicate that performance sensitivities and optimization depend only on first-order statistics. Single sample path-based implementations are discussed.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2003.811252