Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control
This study proposes a modified value-function-approximation (MVFA) and investigates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-ti...
Saved in:
Published in | International journal of control Vol. 94; no. 5; pp. 1321 - 1333 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Abingdon
Taylor & Francis
04.05.2021
Taylor & Francis Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study proposes a modified value-function-approximation (MVFA) and investigates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-time nonlinear systems. Existing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic configuration. Different from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stability during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters convergence and closed-loop system states stability are examined. The proposed MVFA approach yields an alternative single-critic SPI method with uniformly ultimately bounded NN parameter convergence and asymptotic closed-loop system states stability throughout the process of online learning without the need for stabilising mechanisms in the tuning law for critic NN. The proposed approach is verified via simulations. |
---|---|
ISSN: | 0020-7179 1366-5820 |
DOI: | 10.1080/00207179.2019.1648874 |