Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems

SummaryOnline adaptive optimal control methods based on reinforcement learning algorithms typically need to check for the persistence of excitation condition, which is necessary to be known a priori for convergence of the algorithm. However, this condition is often infeasible to implement or monitor...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of adaptive control and signal processing Vol. 29; no. 4; pp. 473 - 493
Main Authors Yasini, Sholeh, Karimpour, Ali, Naghibi Sistani, Mohammad-Bagher, Modares, Hamidreza
Format Journal Article
LanguageEnglish
Published Bognor Regis Blackwell Publishing Ltd 01.04.2015
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:SummaryOnline adaptive optimal control methods based on reinforcement learning algorithms typically need to check for the persistence of excitation condition, which is necessary to be known a priori for convergence of the algorithm. However, this condition is often infeasible to implement or monitor online. This paper proposes an online concurrent reinforcement learning algorithm (CRLA) based on neural networks (NNs) to solve the H ∞  control problem of partially unknown continuous‐time systems, in which the need for persistence of excitation condition is relaxed by using the idea of concurrent learning. First, H ∞  control problem is formulated as a two‐player zero‐sum game, and then, online CRLA is employed to obtain the approximation of the optimal value and the Nash equilibrium of the game. The proposed algorithm is implemented on actor–critic–disturbance NN approximator structure to obtain the solution of the Hamilton–Jacobi–Isaacs equation online forward in time. During the implementation of the algorithm, the control input that acts as one player attempts to make the optimal control while the other player, that is, disturbance, tries to make the worst‐case possible disturbance. Novel update laws are derived for adaptation of the critic and actor NN weights. The stability of the closed‐loop system is guaranteed using Lyapunov technique, and the convergence to the Nash solution of the game is obtained. Simulation results show the effectiveness of the proposed method. Copyright © 2014 John Wiley & Sons, Ltd.
Bibliography:ArticleID:ACS2485
ark:/67375/WNG-58T5Q02F-B
istex:33E89F7FFD5BA13D3B25C1213D18837787BFA469
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0890-6327
1099-1115
1099-1115
DOI:10.1002/acs.2485