Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm

•We introduce a two-phase time aggregation algorithm for MDPs.•The algorithm enables policy improvement outside of the time aggregated MDP domain.•The two phases enable optimization over the entire state space.•Improved approximate solutions can be obtained by employing the proposed approach. This p...

Full description

Saved in:

Bibliographic Details
Published in	European journal of operational research Vol. 240; no. 3; pp. 697 - 705
Main Authors	Arruda, E.F., Fragoso, M.D.
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.02.2015 Elsevier Sequoia S.A
Subjects	Cost analysis Dynamic programming Embedding Markov analysis Markov decision processes Mathematical problems Numerical analysis Optimization algorithms Stochastic optimal control Studies Time aggregation Embedding Dynamic programming Markov decision processes Stochastic optimal control Time aggregation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We introduce a two-phase time aggregation algorithm for MDPs.•The algorithm enables policy improvement outside of the time aggregated MDP domain.•The two phases enable optimization over the entire state space.•Improved approximate solutions can be obtained by employing the proposed approach. This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy optimization in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then alternated until convergence is attained. Some numerical experiments illustrate the results.
ISSN:	0377-2217 1872-6860
DOI:	10.1016/j.ejor.2014.08.023