A data-driven online ADP of exponential convergence based on k-nearest-neighbor Averager, stable term and persistence excitation

With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. Approximate dynamic programming solves the curse of dimensionality problem of dynamic programming, a...

Full description

Saved in:

Bibliographic Details
Published in	2017 4th International Conference on Systems and Informatics (ICSAI) pp. 1 - 6
Main Authors	Zhijian Huang, Shengtang Wang, Huan Zheng, Cheng Zhang, Guichen Zhang, Qili Wu, Qinmin Tan, Zhiyuan Yang
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2017
Subjects	approximate dynamic programming exponential convergence k-nearest-neighbor persistence excitation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. Approximate dynamic programming solves the curse of dimensionality problem of dynamic programming, and it is a new kind of approximate optimization solution that emerges in recent years. Based on the analysis of optimization system, this paper proposes a nonlinear multi-input multi-output, online learning, and data-driven approximate dynamic programming structure and its learning algorithm. The method is achieved from the following three aspects: 1) the critic function of multi-dimensional input critic module of the approximate dynamic programming is approximated with a data-driven k-nearest neighbor method; 2) the multi-output policy iteration of the approximate dynamic programming actor module is calculated with an exponential convergence performance; 3) The critic and actor modules are learned synchronously, and achieve the online optimal and control effect. The optimal control for the longitudinal motion of a thermal underwater glider is used to show the effect of the proposed method. This work can lay a foundation for the theory and application of a nonlinear data-driven multi-input multi-output approximate dynamic programming method. It's also the consensus needs in optimization control and artificial intelligence of many scientific and engineering fields, such as energy conservation, emission reduction, decision support and operational management etc.
DOI:	10.1109/ICSAI.2017.8248254