Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

This paper proposes an integral Q-learning for continuous-time (CT) linear time-invariant (LTI) systems, which solves a linear quadratic regulation (LQR) problem in real time for a given system and a value function, without knowledge about the system dynamics A and B. Here, Q-learning is referred to...

Full description

Saved in:

Bibliographic Details
Published in	Automatica (Oxford) Vol. 48; no. 11; pp. 2850 - 2859
Main Authors	Lee, Jae Young, Park, Jin Bae, Choi, Yoon Ho
Format	Journal Article
Language	English
Published	Kidlington Elsevier Ltd 01.11.2012 Elsevier
Subjects	Adaptative systems Adaptive control Algorithms Applied sciences Computer science; control theory; systems Control system synthesis Control theory. Systems Dynamical systems Dynamics Exact sciences and technology Exploration Integrals LQR Optimal control Optimization Optimization under uncertainties Policies Policy iteration Polyimide resins Q-learning Q-learning Policy iteration LQR Adaptive control Optimization under uncertainties Singular perturbation Optimal policy Control synthesis Iterative method Continuous time Linear time Dynamical system Optimization Singular control Parabolic equation Linear time invariant system Uncertain system LQ control Continuous control Quadratic control Invarying system Learning algorithm Reinforcement learning Real time Optimal control
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper proposes an integral Q-learning for continuous-time (CT) linear time-invariant (LTI) systems, which solves a linear quadratic regulation (LQR) problem in real time for a given system and a value function, without knowledge about the system dynamics A and B. Here, Q-learning is referred to as a family of reinforcement learning methods which find the optimal policy by interaction with an uncertain environment. In the evolution of the algorithm, we first develop an explorized policy iteration (PI) method which is able to deal with known exploration signals. Then, the integral Q-learning algorithm for CT LTI systems is derived based on this PI and the variants of Q-functions derived from the singular perturbation of the control input. The proposed Q-learning scheme evaluates the current value function and the improved control policy at the same time, and are proven stable and convergent to the LQ optimal solution, provided that the initial policy is stabilizing. For the proposed algorithms, practical online implementation methods are investigated in terms of persistency of excitation (PE) and explorations. Finally, simulation results are provided for the better comparison and verification of the performance.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0005-1098 1873-2836
DOI:	10.1016/j.automatica.2012.06.008