Discrete-Time Local Value Iteration Adaptive Dynamic Programming: Convergence Analysis

In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on systems, man, and cybernetics. Systems Vol. 48; no. 6; pp. 875 - 891
Main Authors	Wei, Qinglai, Lewis, Frank L., Liu, Derong, Song, Ruizhuo, Lin, Hanquan
Format	Journal Article
Language	English
Published	New York IEEE 01.06.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptive algorithms Adaptive critic designs adaptive dynamic programming (ADP) Aerospace electronics Algorithms approximate dynamic programming Approximation algorithms Computer simulation Control theory Convergence Dynamic programming Iterative algorithms Iterative methods local iteration Machine learning neural networks neuro-dynamic programming Nonlinear systems Optimal control
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, convergence properties are established for the newly developed discrete-time local value iteration adaptive dynamic programming (ADP) algorithm. The present local iterative ADP algorithm permits an arbitrary positive semidefinite function to initialize the algorithm. Employing a state-dependent learning rate function, for the first time, the iterative value function and iterative control law can be updated in a subset of the state space instead of the whole state space, which effectively relaxes the computational burden. A new analysis method for the convergence property is developed to prove that the iterative value functions will converge to the optimum under some mild constraints. Monotonicity of the local value iteration ADP algorithm is presented, which shows that under some special conditions of the initial value function and the learning rate function, the iterative value function can monotonically converge to the optimum. Finally, three simulation examples and comparisons are given to illustrate the performance of the developed algorithm.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2016.2623766