Optimized Backstepping Tracking Control Using Reinforcement Learning for a Class of Stochastic Nonlinear Strict-Feedback Systems

In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dyna...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 34; no. 3; pp. 1291 - 1303
Main Authors	Wen, Guoxing, Xu, Liguang, Li, Bin
Format	Journal Article
Language	English
Published	United States IEEE 01.03.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Backstepping Control systems Excitation Feedback Heuristic algorithms Identifier-critic-actor architecture Machine learning Mathematical model Natural selection neural network (NN) Nonlinear dynamical systems Nonlinear systems Optimal control Optimization Performance evaluation Reinforcement reinforcement learning (RL) Stability analysis stochastic nonlinear system Stochastic processes Stochastic systems Stochasticity Subsystems Tracking control
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dynamic, the critic aims to evaluate the control performance and to give the feedback to the actor, and the actor aims to perform the control action. The basic control idea is that all virtual controls and the actual control of backstepping are designed as the optimized solution of corresponding subsystems so that the entire backstepping control is optimized. Different from the deterministic system, stochastic system control needs to consider not only the stochastic disturbance depicted by the Wiener process but also the Hessian term in stability analysis. If the backstepping control is developed on the basis of the published RL optimization methods, it will be difficult to be achieved because, on the one hand, RL of these methods are very complex in the algorithm thanks to their critic and actor updating laws deriving from the negative gradient of the square of approximation of Hamilton-Jacobi-Bellman (HJB) equation; on the other hand, these methods require persistence excitation and known dynamic, where persistence excitation is for training adaptive parameters sufficiently. In this research, both critic and actor updating laws are derived from the negative gradient of a simple positive function, which is yielded on the basis of a partial derivative of the HJB equation. As a result, the RL algorithm can be significantly simplified, meanwhile, two requirements of persistence excitation and known dynamic can be released. Therefore, it can be a natural selection for stochastic optimization control. Finally, from two aspects of theory and simulation, it is demonstrated that the proposed control can arrive at the desired system performance.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2021.3105176