Optimized Backstepping Tracking Control Using Reinforcement Learning for a Class of Stochastic Nonlinear Strict-Feedback Systems

In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dyna...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 34; no. 3; pp. 1291 - 1303
Main Authors Wen, Guoxing, Xu, Liguang, Li, Bin
Format Journal Article
LanguageEnglish
Published United States IEEE 01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this article, an optimized backstepping (OB) control scheme is proposed for a class of stochastic nonlinear strict-feedback systems with unknown dynamics by using reinforcement learning (RL) strategy of identifier-critic-actor architecture, where the identifier aims to compensate the unknown dynamic, the critic aims to evaluate the control performance and to give the feedback to the actor, and the actor aims to perform the control action. The basic control idea is that all virtual controls and the actual control of backstepping are designed as the optimized solution of corresponding subsystems so that the entire backstepping control is optimized. Different from the deterministic system, stochastic system control needs to consider not only the stochastic disturbance depicted by the Wiener process but also the Hessian term in stability analysis. If the backstepping control is developed on the basis of the published RL optimization methods, it will be difficult to be achieved because, on the one hand, RL of these methods are very complex in the algorithm thanks to their critic and actor updating laws deriving from the negative gradient of the square of approximation of Hamilton-Jacobi-Bellman (HJB) equation; on the other hand, these methods require persistence excitation and known dynamic, where persistence excitation is for training adaptive parameters sufficiently. In this research, both critic and actor updating laws are derived from the negative gradient of a simple positive function, which is yielded on the basis of a partial derivative of the HJB equation. As a result, the RL algorithm can be significantly simplified, meanwhile, two requirements of persistence excitation and known dynamic can be released. Therefore, it can be a natural selection for stochastic optimization control. Finally, from two aspects of theory and simulation, it is demonstrated that the proposed control can arrive at the desired system performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2021.3105176