A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes
The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs constraints, is based on convex linear programming. In this brief...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.05.2020
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2005.03718 |
Cover
Summary: | The canonical solution methodology for finite constrained Markov decision
processes (CMDPs), where the objective is to maximize the expected
infinite-horizon discounted rewards subject to the expected infinite-horizon
discounted costs constraints, is based on convex linear programming. In this
brief, we first prove that the optimization objective in the dual linear
program of a finite CMDP is a piece-wise linear convex function (PWLC) with
respect to the Lagrange penalty multipliers. Next, we propose a novel two-level
Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find
the optimal state-value function and Lagrange penalty multipliers of a finite
CMDP. The proposed algorithm is applied in two stochastic control problems with
constraints: robot navigation in a grid world and solar-powered unmanned aerial
vehicle (UAV)-based wireless network management. We empirically compare the
convergence performance of the proposed GAS algorithm with binary search (BS),
Lagrangian primal-dual optimization (PDO), and Linear Programming (LP).
Compared with benchmark algorithms, it is shown that the proposed GAS algorithm
converges to the optimal solution faster, does not require hyper-parameter
tuning, and is not sensitive to initialization of the Lagrange penalty
multiplier. |
---|---|
DOI: | 10.48550/arxiv.2005.03718 |