A novel actor–critic–identifier architecture for approximate optimal control of uncertain nonlinear systems

An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor–critic–identifier (ACI) is proposed to approximate the Hamilton–Jacobi–Bellman equation using three neural network (NN...

Full description

Saved in:

Bibliographic Details
Published in	Automatica (Oxford) Vol. 49; no. 1; pp. 82 - 92
Main Authors	Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.
Format	Journal Article
Language	English
Published	Kidlington Elsevier Ltd 01.01.2013 Elsevier
Subjects	Actor–critic–identifier Adaptative systems Adaptive control Applied sciences Approximate dynamic programming Artificial intelligence Computer science; control theory; systems Control theory. Systems Exact sciences and technology Learning control Optimal control Approximate dynamic programming Learning control Actor–critic–identifier Adaptive control Optimal control Exponential convergence Network structure Non linear control Lyapunov method Continuous time Dynamical system Adaptive method Uncertain system Identifier Continuous control Infinite horizon Dynamic programming Dynamic model Optimal control (mathematics) Numerical convergence Intelligent control Closed loop Bellman equation Reinforcement learning Neural network Non linear system Closed feedback Persistence Value function Hamilton Jacobi equation Actor-critic-identifier Asymptotic approximation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor–critic–identifier (ACI) is proposed to approximate the Hamilton–Jacobi–Bellman equation using three neural network (NN) structures—actor and critic NNs approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of using the ACI architecture is that learning by the actor, critic, and identifier is continuous and simultaneous, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded (UUB) stability of the closed-loop system. Simulation results demonstrate the performance of the actor–critic–identifier method for approximate optimal control.
ISSN:	0005-1098 1873-2836
DOI:	10.1016/j.automatica.2012.09.019