Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

This study proposes a modified value-function-approximation (MVFA) and investigates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-ti...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of control Vol. 94; no. 5; pp. 1321 - 1333
Main Authors Tang, Difan, Chen, Lei, Tian, Zhao Feng, Hu, Eric
Format Journal Article
LanguageEnglish
Published Abingdon Taylor & Francis 04.05.2021
Taylor & Francis Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study proposes a modified value-function-approximation (MVFA) and investigates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-time nonlinear systems. Existing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic configuration. Different from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stability during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters convergence and closed-loop system states stability are examined. The proposed MVFA approach yields an alternative single-critic SPI method with uniformly ultimately bounded NN parameter convergence and asymptotic closed-loop system states stability throughout the process of online learning without the need for stabilising mechanisms in the tuning law for critic NN. The proposed approach is verified via simulations.
ISSN:0020-7179
1366-5820
DOI:10.1080/00207179.2019.1648874