Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infin...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on systems, man and cybernetics. Part B, Cybernetics Vol. 38; no. 4; pp. 943 - 949
Main Authors	Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.
Format	Journal Article
Language	English
Published	United States IEEE 01.08.2008
Subjects	Adaptive critics Algorithms approximate dynamic programming (ADP) Approximation Computer Simulation Convergence Dynamic programming Dynamical systems Feedback Function approximation Hamilton Jacobi Bellman (HJB) Jacobian matrices Mathematical analysis Models, Theoretical Neural networks Nonlinear dynamics Nonlinear equations Nonlinear systems Nonlinearity Optimal control policy iteration Programming, Linear Regulators Robotics and automation Systems Theory value iteration approximate dynamic programming (ADP) Hamilton Jacobi Bellman (HJB) Adaptive critics policy iteration value iteration
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	1083-4419 1941-0492 1941-0492
DOI:	10.1109/TSMCB.2008.926614