A data-driven online ADP of exponential convergence based on k-nearest-neighbor Averager, stable term and persistence excitation
With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. Approximate dynamic programming solves the curse of dimensionality problem of dynamic programming, a...
Saved in:
Published in | 2017 4th International Conference on Systems and Informatics (ICSAI) pp. 1 - 6 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.11.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. Approximate dynamic programming solves the curse of dimensionality problem of dynamic programming, and it is a new kind of approximate optimization solution that emerges in recent years. Based on the analysis of optimization system, this paper proposes a nonlinear multi-input multi-output, online learning, and data-driven approximate dynamic programming structure and its learning algorithm. The method is achieved from the following three aspects: 1) the critic function of multi-dimensional input critic module of the approximate dynamic programming is approximated with a data-driven k-nearest neighbor method; 2) the multi-output policy iteration of the approximate dynamic programming actor module is calculated with an exponential convergence performance; 3) The critic and actor modules are learned synchronously, and achieve the online optimal and control effect. The optimal control for the longitudinal motion of a thermal underwater glider is used to show the effect of the proposed method. This work can lay a foundation for the theory and application of a nonlinear data-driven multi-input multi-output approximate dynamic programming method. It's also the consensus needs in optimization control and artificial intelligence of many scientific and engineering fields, such as energy conservation, emission reduction, decision support and operational management etc. |
---|---|
DOI: | 10.1109/ICSAI.2017.8248254 |