Learning‐based T‐sHDP(λ) for optimal control of a class of nonlinear discrete‐time systems

This article investigates the optimal control problem via reinforcement learning for a class of nonlinear discrete‐time systems. The nonlinear system under consideration is assumed to be partially unknown. A new learning‐based algorithm, T‐step heuristic dynamic programming with eligibility traces (...

Full description

Saved in:

Bibliographic Details
Published in	International journal of robust and nonlinear control Vol. 32; no. 5; pp. 2624 - 2643
Main Authors	Yu, Luyang, Liu, Weibo, Liu, Yurong, Alsaadi, Fawaz E.
Format	Journal Article
Language	English
Published	Bognor Regis Wiley Subscription Services, Inc 25.03.2022
Subjects	Algorithms Control systems Convergence Discrete time systems Dynamic programming eligibility traces (ET) heuristic dynamic programming (HDP) learning‐based optimal control Machine learning Neural networks Nonlinear systems Optimal control value iteration
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This article investigates the optimal control problem via reinforcement learning for a class of nonlinear discrete‐time systems. The nonlinear system under consideration is assumed to be partially unknown. A new learning‐based algorithm, T‐step heuristic dynamic programming with eligibility traces (T‐sHDP(λ)), is proposed to tackle the optimal control problem for such partially unknown system. First, the concerned optimal control problem is turned into its equivalence problem, that is, solving a Bellman equation. Then, the T‐sHDP(λ) is utilized to get an approximate solution of Bellman equation, and a rigorous convergence analysis is also conducted as well. Instead of the commonly used single step update approach, the T‐sHDP(λ) stores finite step past returns by introducing a parameter, and then utilizes these knowledge to update the value function (VF) of multiple moments synchronously, so as to achieve higher convergence speed. For implementation of T‐sHDP(λ), a neural network‐based actor‐critic architecture is applied to approximate VF and optimal control scheme. Finally, the feasibility of the algorithm is demonstrated by two illustrative simulation examples.
Bibliography:	Funding information Key Laboratory of Advanced Perception and Intelligent Control of High‐end Equipment, Ministry of Education, Anhui Polytechnic University, GDSC202014; National Natural Science Foundation of China, 61773017; 61873230; 62173292; The Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, Saudi Arabia, FP‐115‐43
ISSN:	1049-8923 1099-1239
DOI:	10.1002/rnc.5847