Optimal and Autonomous Control Using Reinforcement Learning: A Survey

This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_{2...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 29; no. 6; pp. 2042 - 2062
Main Authors	Kiumarsi, Bahare, Vamvoudakis, Kyriakos G., Modares, Hamidreza, Lewis, Frank L.
Format	Journal Article
Language	English
Published	United States IEEE 01.06.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithm design and analysis Algorithms Approximation algorithms Autonomy Computer & video games data-based optimization Feedback control Games H-infinity control Heuristic algorithms Learning Learning (artificial intelligence) Machine learning Multiagent systems Optimal control Reinforcement reinforcement learning (RL) State-of-the-art reviews System dynamics
Online Access	Get full text

Cover

Loading…

Abstract	This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_{2} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_\infty </tex-math></inline-formula> control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
AbstractList	This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications. This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_{2} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_\infty </tex-math></inline-formula> control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications. This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications. This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal H2 and H∞ control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
Author	Vamvoudakis, Kyriakos G. Lewis, Frank L. Modares, Hamidreza Kiumarsi, Bahare
Author_xml	– sequence: 1 givenname: Bahare orcidid: 0000-0002-9701-8375 surname: Kiumarsi fullname: Kiumarsi, Bahare email: b_kiomarsi@yahoo.com organization: UTA Research Institute, University of Texas at Arlington, Arlington, TX, USA – sequence: 2 givenname: Kyriakos G. orcidid: 0000-0003-1978-4848 surname: Vamvoudakis fullname: Vamvoudakis, Kyriakos G. email: kyriakos@vt.edu organization: Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Tech, Blacksburg, VA, USA – sequence: 3 givenname: Hamidreza orcidid: 0000-0003-0800-5140 surname: Modares fullname: Modares, Hamidreza email: modaresh@mst.edu organization: Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, USA – sequence: 4 givenname: Frank L. orcidid: 0000-0003-4074-1615 surname: Lewis fullname: Lewis, Frank L. email: lewis@uta.edu organization: UTA Research Institute, University of Texas at Arlington, Arlington, TX, USA
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/29771662$$D View this record in MEDLINE/PubMed
BookMark	eNp9kU9L7DAUxYMo6lO_gIIU3LzNjLlJmqTuhkGfwqDgH3AX0vZWKm0yJq3gtzfzZnThwmwSwu9czj3nD9l23iEhx0CnALQ4f7y9XTxMGQU1ZUpxkestss9AsgnjWm9_v9XzHjmK8ZWmI2kuRbFL9lihFEjJ9snl3XJoe9tl1tXZbBy8870fYzb3bgi-y55i616ye2xd40OFPbohW6ANLn1fZLPsYQzv-HFIdhrbRTza3Afk6erycX49Wdz9u5nPFpNKgBomBdW8zOtknwOtKwlNaRXjlQapapsLWVBZWKmg1CigybnEEhpRSs4aAEv5Afm7nrsM_m3EOJi-jRV2nXWYXBtGRdo6V2KFnv1AX_0YXHJnGCiRCyV1kajTDTWWPdZmGVIY4cN8BZQAvQaq4GMM2JiqHezQruKxbWeAmlUd5n8dZlWH2dSRpOyH9Gv6r6KTtahFxG9BCqiQOuef_raTJg
CODEN	ITNNAL
CitedBy_id	crossref_primary_10_1016_j_jfranklin_2019_12_017 crossref_primary_10_1002_rnc_4962 crossref_primary_10_1007_s11432_023_3818_3 crossref_primary_10_1109_ACCESS_2020_3042994 crossref_primary_10_1109_ACCESS_2019_2931884 crossref_primary_10_1109_TNNLS_2021_3071548 crossref_primary_10_1016_j_automatica_2021_109860 crossref_primary_10_1109_TNNLS_2018_2885530 crossref_primary_10_1109_TSMC_2023_3306338 crossref_primary_10_1016_j_jfranklin_2022_12_028 crossref_primary_10_1109_TIA_2024_3351800 crossref_primary_10_1007_s11424_020_9265_y crossref_primary_10_1109_TNNLS_2022_3152268 crossref_primary_10_1016_j_automatica_2023_111261 crossref_primary_10_1109_TCYB_2018_2884315 crossref_primary_10_1002_rnc_4730 crossref_primary_10_1016_j_asoc_2021_107153 crossref_primary_10_1109_TCYB_2022_3233593 crossref_primary_10_1016_j_ifacol_2021_10_381 crossref_primary_10_1109_LCSYS_2020_2979572 crossref_primary_10_3390_app13031874 crossref_primary_10_1016_j_asoc_2024_112203 crossref_primary_10_1109_TNNLS_2018_2871361 crossref_primary_10_1002_asjc_3609 crossref_primary_10_1016_j_phycom_2022_101799 crossref_primary_10_1109_ACCESS_2019_2941229 crossref_primary_10_1016_j_neucom_2025_129986 crossref_primary_10_1007_s11432_023_3982_y crossref_primary_10_1016_j_ejcon_2022_100633 crossref_primary_10_1109_TNNLS_2023_3264815 crossref_primary_10_1007_s11042_022_13000_0 crossref_primary_10_1109_TCYB_2022_3202864 crossref_primary_10_1109_TNNLS_2022_3143527 crossref_primary_10_1007_s00521_019_04372_w crossref_primary_10_1109_LCSYS_2023_3343439 crossref_primary_10_1109_TPWRS_2019_2931293 crossref_primary_10_1109_TSMC_2021_3050960 crossref_primary_10_1016_j_neucom_2025_129977 crossref_primary_10_1080_01691864_2023_2229886 crossref_primary_10_1109_ACCESS_2020_3025194 crossref_primary_10_1080_00207721_2023_2221240 crossref_primary_10_1016_j_ejcon_2024_101093 crossref_primary_10_1109_TPDS_2021_3092270 crossref_primary_10_1109_TNNLS_2019_2955699 crossref_primary_10_1109_TAI_2024_3419757 crossref_primary_10_1002_rnc_5826 crossref_primary_10_1016_j_chaos_2022_112535 crossref_primary_10_1016_j_epsr_2023_109945 crossref_primary_10_1109_TNSE_2022_3185019 crossref_primary_10_1016_j_asoc_2024_111582 crossref_primary_10_1002_rnc_5828 crossref_primary_10_1016_j_cja_2021_11_018 crossref_primary_10_1177_10775463241307703 crossref_primary_10_1007_s10922_022_09667_3 crossref_primary_10_1109_ACCESS_2019_2960064 crossref_primary_10_1109_TSMC_2023_3305498 crossref_primary_10_1109_TNNLS_2019_2899311 crossref_primary_10_1109_TAI_2022_3187951 crossref_primary_10_1016_j_array_2022_100142 crossref_primary_10_1007_s00202_023_01875_7 crossref_primary_10_1016_j_automatica_2023_111468 crossref_primary_10_3390_ma15144825 crossref_primary_10_1002_acs_3326 crossref_primary_10_1109_ACCESS_2020_3013032 crossref_primary_10_1016_j_ifacol_2021_04_181 crossref_primary_10_1109_TCYB_2021_3060736 crossref_primary_10_1109_TWC_2021_3082986 crossref_primary_10_1016_j_ifacol_2021_08_365 crossref_primary_10_1109_TVT_2023_3254604 crossref_primary_10_1109_TAC_2023_3274629 crossref_primary_10_1016_j_asoc_2024_112417 crossref_primary_10_3390_s24248109 crossref_primary_10_3390_s20082320 crossref_primary_10_1109_ACCESS_2019_2929120 crossref_primary_10_1109_TNNLS_2023_3280161 crossref_primary_10_3390_fractalfract8020099 crossref_primary_10_1109_TCYB_2020_3029077 crossref_primary_10_1016_j_simpat_2024_102962 crossref_primary_10_1109_TNNLS_2020_2967871 crossref_primary_10_1002_rnc_4911 crossref_primary_10_1177_00202940211007177 crossref_primary_10_1109_TIM_2023_3282297 crossref_primary_10_1109_TVT_2020_3019687 crossref_primary_10_1016_j_eswa_2023_121880 crossref_primary_10_1103_PhysRevB_107_235139 crossref_primary_10_1016_j_knosys_2022_109448 crossref_primary_10_1016_j_jfranklin_2021_11_009 crossref_primary_10_1109_JSAC_2021_3087227 crossref_primary_10_1109_ACCESS_2019_2923845 crossref_primary_10_1109_JIOT_2020_3015042 crossref_primary_10_1007_s10489_023_04867_z crossref_primary_10_1016_j_eswa_2023_119770 crossref_primary_10_1007_s00422_022_00922_z crossref_primary_10_1109_TC_2021_3072072 crossref_primary_10_1109_LRA_2019_2931179 crossref_primary_10_1109_TNNLS_2020_3017461 crossref_primary_10_1109_JAS_2023_123096 crossref_primary_10_1109_TNNLS_2022_3201705 crossref_primary_10_3390_e25121570 crossref_primary_10_1109_JIOT_2023_3337109 crossref_primary_10_1016_j_neunet_2022_11_012 crossref_primary_10_1002_acs_3115 crossref_primary_10_1002_acs_3234 crossref_primary_10_1109_TNNLS_2022_3185055 crossref_primary_10_1109_TGCN_2023_3268208 crossref_primary_10_26599_AIR_2022_9150007 crossref_primary_10_1109_JIOT_2020_3004394 crossref_primary_10_1080_01969722_2020_1758466 crossref_primary_10_1016_j_automatica_2024_111551 crossref_primary_10_3390_robotics11050085 crossref_primary_10_1016_j_neucom_2024_129185 crossref_primary_10_1016_j_sysconle_2020_104847 crossref_primary_10_1109_TNNLS_2019_2899594 crossref_primary_10_1016_j_automatica_2023_111490 crossref_primary_10_1109_TAI_2024_3433614 crossref_primary_10_1109_TNNLS_2021_3088947 crossref_primary_10_1109_TSMC_2024_3390768 crossref_primary_10_1007_s00466_023_02335_6 crossref_primary_10_1109_TNNLS_2023_3301383 crossref_primary_10_1080_00207721_2022_2085343 crossref_primary_10_1002_oca_3058 crossref_primary_10_1109_TNNLS_2019_2900592 crossref_primary_10_1007_s40435_021_00836_x crossref_primary_10_1109_ACCESS_2024_3448535 crossref_primary_10_1007_s10846_018_0832_6 crossref_primary_10_1103_PhysRevResearch_4_013221 crossref_primary_10_3103_S1060992X2401003X crossref_primary_10_1007_s11424_022_2037_0 crossref_primary_10_1109_TNNLS_2023_3278729 crossref_primary_10_1002_rnc_5403 crossref_primary_10_1002_acs_3494 crossref_primary_10_1109_TCYB_2019_2946122 crossref_primary_10_1007_s11071_023_08909_6 crossref_primary_10_1016_j_isatra_2023_11_032 crossref_primary_10_1109_ACCESS_2024_3445143 crossref_primary_10_1109_TNNLS_2021_3098985 crossref_primary_10_3390_s25051416 crossref_primary_10_1049_cth2_12563 crossref_primary_10_1002_rnc_4322 crossref_primary_10_1007_s12555_019_0402_0 crossref_primary_10_1109_TASE_2023_3305615 crossref_primary_10_1007_s12555_020_0063_z crossref_primary_10_1007_s12555_018_0489_8 crossref_primary_10_1002_rnc_7734 crossref_primary_10_1016_j_automatica_2021_109687 crossref_primary_10_1109_TCYB_2023_3324601 crossref_primary_10_1109_JIOT_2023_3342032 crossref_primary_10_1016_j_automatica_2023_110912 crossref_primary_10_1016_j_neucom_2020_04_095 crossref_primary_10_1016_j_robot_2019_103362 crossref_primary_10_17341_gazimmfd_875563 crossref_primary_10_1109_TSMC_2018_2837899 crossref_primary_10_3390_s24020700 crossref_primary_10_1109_LCSYS_2020_3001241 crossref_primary_10_1109_ACCESS_2024_3461756 crossref_primary_10_1109_TCNS_2021_3074256 crossref_primary_10_1002_rnc_7608 crossref_primary_10_1109_TIE_2022_3220886 crossref_primary_10_1016_j_neucom_2022_03_036 crossref_primary_10_1109_TNNLS_2019_2927869 crossref_primary_10_3390_math11040906 crossref_primary_10_1109_TNNLS_2022_3167688 crossref_primary_10_1109_TNNLS_2023_3244934 crossref_primary_10_1109_TNNLS_2021_3123444 crossref_primary_10_1007_s11768_019_8168_8 crossref_primary_10_1016_j_neunet_2019_04_026 crossref_primary_10_1016_j_neunet_2021_10_009 crossref_primary_10_1016_j_neucom_2021_10_046 crossref_primary_10_1016_j_ins_2024_121283 crossref_primary_10_1109_TAES_2021_3074134 crossref_primary_10_3390_s20226595 crossref_primary_10_1016_j_engappai_2025_110373 crossref_primary_10_1016_j_neunet_2024_106364 crossref_primary_10_1109_JAS_2023_123843 crossref_primary_10_3390_mi13030458 crossref_primary_10_1109_ACCESS_2022_3175828 crossref_primary_10_1016_j_ins_2021_03_043 crossref_primary_10_1109_TSMC_2023_3346949 crossref_primary_10_3390_e20090659 crossref_primary_10_1016_j_jfranklin_2019_05_020 crossref_primary_10_1049_iet_pel_2019_1339 crossref_primary_10_1109_TNNLS_2020_3044196 crossref_primary_10_1016_j_engappai_2018_07_004 crossref_primary_10_1109_ACCESS_2022_3208058 crossref_primary_10_1002_aaai_12087 crossref_primary_10_1109_TNNLS_2022_3191673 crossref_primary_10_1109_ACCESS_2020_3027152 crossref_primary_10_1080_00207721_2025_2474137 crossref_primary_10_1016_j_engappai_2022_105106 crossref_primary_10_1002_rnc_5847 crossref_primary_10_1007_s12190_023_01857_9 crossref_primary_10_3390_sym12040631 crossref_primary_10_1002_rnc_5729 crossref_primary_10_1016_j_robot_2020_103515 crossref_primary_10_1109_TCYB_2022_3198078 crossref_primary_10_1016_j_ast_2020_106442 crossref_primary_10_1177_0020294019830434 crossref_primary_10_1109_JIOT_2024_3401829 crossref_primary_10_3390_jsan8040057 crossref_primary_10_1016_j_isatra_2022_02_034 crossref_primary_10_1109_TSMC_2021_3129534 crossref_primary_10_1080_00207721_2020_1839142 crossref_primary_10_1016_j_cjpre_2022_09_004 crossref_primary_10_3390_drones6120378 crossref_primary_10_1088_1361_6501_ad7a18 crossref_primary_10_1109_TVT_2018_2871606 crossref_primary_10_1109_OJCSYS_2024_3368850 crossref_primary_10_1016_j_engappai_2022_105581 crossref_primary_10_1109_LCSYS_2024_3409671 crossref_primary_10_1109_TIE_2022_3192676 crossref_primary_10_1016_j_isatra_2024_11_007 crossref_primary_10_1109_TVT_2024_3426326 crossref_primary_10_1016_j_sysconle_2021_104983 crossref_primary_10_1109_ACCESS_2019_2891575 crossref_primary_10_1016_j_ifacol_2024_12_017 crossref_primary_10_1049_iet_pel_2019_0159 crossref_primary_10_1016_j_neucom_2024_127835 crossref_primary_10_1007_s11424_025_4572_y crossref_primary_10_1109_TAC_2022_3155384 crossref_primary_10_1038_s42256_018_0010_3 crossref_primary_10_1109_LCSYS_2021_3072007 crossref_primary_10_3390_app11052312 crossref_primary_10_1016_j_neucom_2021_10_083 crossref_primary_10_3390_s22249867 crossref_primary_10_1016_j_arcontrol_2019_09_008 crossref_primary_10_1049_cth2_12661 crossref_primary_10_1109_TCSI_2024_3417257 crossref_primary_10_1109_TCYB_2025_3540967 crossref_primary_10_1016_j_ins_2022_03_004 crossref_primary_10_1016_j_automatica_2021_110058 crossref_primary_10_1007_s10462_023_10497_1 crossref_primary_10_1016_j_rser_2023_113877 crossref_primary_10_1109_TNNLS_2019_2897814 crossref_primary_10_1007_s10489_024_05720_7 crossref_primary_10_1109_TNNLS_2023_3245630 crossref_primary_10_32604_cmc_2023_039164 crossref_primary_10_1016_j_ifacol_2020_12_027 crossref_primary_10_1109_TMECH_2024_3376430 crossref_primary_10_1109_ACCESS_2022_3184801 crossref_primary_10_1016_j_neucom_2021_01_096 crossref_primary_10_1109_TCYB_2020_3028988 crossref_primary_10_3390_en15072374 crossref_primary_10_1002_rnc_6475 crossref_primary_10_1016_j_automatica_2022_110684 crossref_primary_10_1007_s12555_019_0120_7 crossref_primary_10_1016_j_prime_2024_100877 crossref_primary_10_1007_s12555_022_0745_9 crossref_primary_10_1142_S2301385023310027 crossref_primary_10_1007_s00170_021_07895_6 crossref_primary_10_1016_j_neucom_2020_06_083 crossref_primary_10_1007_s11042_020_09590_2 crossref_primary_10_1109_TNNLS_2021_3054402 crossref_primary_10_1109_ACCESS_2020_3000781 crossref_primary_10_1109_LCSYS_2024_3417178 crossref_primary_10_1109_TSMC_2023_3324215 crossref_primary_10_1145_3608479 crossref_primary_10_1109_JAS_2022_105992 crossref_primary_10_1109_TCYB_2019_2926248 crossref_primary_10_1016_j_conengprac_2021_105042 crossref_primary_10_1002_rnc_5132 crossref_primary_10_1002_rnc_6340 crossref_primary_10_1109_TII_2019_2925632 crossref_primary_10_1016_j_jfranklin_2024_106812 crossref_primary_10_1016_j_neucom_2023_03_045 crossref_primary_10_1109_TAC_2024_3422889 crossref_primary_10_1007_s00521_023_09244_y crossref_primary_10_1016_j_arcontrol_2019_01_003 crossref_primary_10_1109_TSG_2021_3050419 crossref_primary_10_1146_annurev_control_042920_020211 crossref_primary_10_1016_j_neucom_2024_127869 crossref_primary_10_1109_TNNLS_2021_3112718 crossref_primary_10_1002_rnc_6372 crossref_primary_10_1016_j_mechmachtheory_2024_105676 crossref_primary_10_3390_su15065249 crossref_primary_10_1109_TAC_2020_2986211 crossref_primary_10_1002_int_22647 crossref_primary_10_3390_e25071101 crossref_primary_10_1002_rnc_7101 crossref_primary_10_1007_s10664_021_09941_z crossref_primary_10_1016_j_asoc_2020_106665 crossref_primary_10_3390_math13020189 crossref_primary_10_1007_s12555_019_0165_7 crossref_primary_10_1109_TIE_2024_3366218 crossref_primary_10_1016_j_neucom_2024_128609 crossref_primary_10_1016_j_aei_2023_102328 crossref_primary_10_1109_TNNLS_2023_3245980 crossref_primary_10_1016_j_ifacol_2022_07_108 crossref_primary_10_1109_TETCI_2024_3361860 crossref_primary_10_1109_TIA_2023_3300290 crossref_primary_10_1109_TNNLS_2021_3138924 crossref_primary_10_24193_subbtref_67_2_02 crossref_primary_10_1007_s10462_021_09997_9 crossref_primary_10_1002_rnc_7451 crossref_primary_10_1109_TNNLS_2021_3085358 crossref_primary_10_1142_S0219649224500801 crossref_primary_10_1109_TCYB_2020_3006871 crossref_primary_10_1016_j_eswa_2023_121070 crossref_primary_10_1080_18824889_2023_2278753 crossref_primary_10_1007_s10489_024_05733_2 crossref_primary_10_1109_TNSE_2022_3211193 crossref_primary_10_1016_j_cja_2019_10_005 crossref_primary_10_1002_rnc_5341 crossref_primary_10_1080_00207179_2018_1503724 crossref_primary_10_1109_LCSYS_2022_3184647 crossref_primary_10_1109_TSMC_2024_3404147 crossref_primary_10_1016_j_automatica_2022_110761 crossref_primary_10_1109_TASE_2024_3359219 crossref_primary_10_1016_j_ifacol_2022_09_395 crossref_primary_10_1109_TCYB_2020_2978088 crossref_primary_10_1016_j_comcom_2021_04_025 crossref_primary_10_1177_09544100241278023 crossref_primary_10_1109_TNNLS_2020_3026010 crossref_primary_10_3390_aerospace12010030 crossref_primary_10_1109_TNNLS_2023_3303811 crossref_primary_10_1109_TITS_2023_3292967 crossref_primary_10_1109_TSMC_2024_3428482 crossref_primary_10_3934_mbe_2023274 crossref_primary_10_1016_j_automatica_2025_112168 crossref_primary_10_1109_JIOT_2020_2993012 crossref_primary_10_1016_j_oceaneng_2021_109794 crossref_primary_10_1007_s10489_023_04574_9 crossref_primary_10_1016_j_ifacol_2023_10_1251 crossref_primary_10_1109_TIE_2023_3327574 crossref_primary_10_1109_TSMC_2024_3417230 crossref_primary_10_1080_00207721_2024_2312886 crossref_primary_10_1007_s11071_021_07049_z crossref_primary_10_1007_s10846_022_01584_6 crossref_primary_10_1016_j_foar_2022_10_003 crossref_primary_10_1109_TNNLS_2023_3333551 crossref_primary_10_2196_18477 crossref_primary_10_1109_ACCESS_2021_3076538 crossref_primary_10_1109_TSMC_2023_3247888 crossref_primary_10_1109_TASE_2022_3216217 crossref_primary_10_1002_rnc_6213 crossref_primary_10_1016_j_asr_2022_09_034 crossref_primary_10_1109_JAS_2023_123651 crossref_primary_10_3390_machines13030186 crossref_primary_10_1109_TII_2019_2953932 crossref_primary_10_1016_j_ast_2021_107204 crossref_primary_10_1109_JAS_2021_1004353 crossref_primary_10_1109_TNNLS_2023_3340741 crossref_primary_10_1039_D3LC01012K crossref_primary_10_1016_j_neucom_2024_128411 crossref_primary_10_1109_TAC_2022_3181248 crossref_primary_10_1109_TNNLS_2019_2957287 crossref_primary_10_1016_j_neucom_2025_129363 crossref_primary_10_1002_rnc_5350 crossref_primary_10_1016_j_scs_2021_102822 crossref_primary_10_1088_1742_6596_1449_1_012058 crossref_primary_10_61186_joc_16_4_57 crossref_primary_10_3390_robotics8040082 crossref_primary_10_1016_j_neucom_2024_128418 crossref_primary_10_1109_TIE_2022_3204966 crossref_primary_10_1109_TNNLS_2021_3070852 crossref_primary_10_1002_rnc_6191 crossref_primary_10_1016_j_engappai_2024_108430 crossref_primary_10_1016_j_eswa_2023_119910 crossref_primary_10_1109_ACCESS_2021_3069210 crossref_primary_10_1002_acs_2949 crossref_primary_10_1109_ACCESS_2021_3061729 crossref_primary_10_1109_TNNLS_2021_3137524 crossref_primary_10_1002_acs_3919 crossref_primary_10_1109_TNNLS_2021_3136554 crossref_primary_10_1016_j_automatica_2025_112197 crossref_primary_10_1109_TSMC_2023_3298217 crossref_primary_10_1021_acsestengg_2c00156 crossref_primary_10_1109_JPROC_2023_3303358 crossref_primary_10_1016_j_oceaneng_2022_112742 crossref_primary_10_1080_18824889_2023_2167540 crossref_primary_10_1109_TAC_2023_3339660 crossref_primary_10_1016_j_neucom_2024_128677 crossref_primary_10_16984_saufenbilder_1286391 crossref_primary_10_1109_LRA_2023_3332556 crossref_primary_10_1038_s41467_023_41379_3 crossref_primary_10_1002_rnc_7278 crossref_primary_10_1109_TAES_2021_3094628 crossref_primary_10_1109_LCSYS_2020_3041218 crossref_primary_10_1109_TSMC_2019_2933152 crossref_primary_10_1109_TRO_2019_2929014 crossref_primary_10_1007_s11432_022_3702_4 crossref_primary_10_1109_TNNLS_2019_2955857 crossref_primary_10_1109_TCYB_2019_2939487 crossref_primary_10_3390_aerospace10110951 crossref_primary_10_1109_TSMC_2020_3042876 crossref_primary_10_1109_TNNLS_2022_3213566 crossref_primary_10_1016_j_matcom_2021_06_023 crossref_primary_10_1109_JAS_2024_124323 crossref_primary_10_1109_TSMC_2024_3505945 crossref_primary_10_1016_j_automatica_2023_111332 crossref_primary_10_1007_s10462_023_10641_x crossref_primary_10_3390_s23104962 crossref_primary_10_1109_TIV_2023_3282681 crossref_primary_10_1109_TNNLS_2020_3006080 crossref_primary_10_1007_s40435_021_00776_6 crossref_primary_10_1109_TNNLS_2020_2976787 crossref_primary_10_1109_TNNLS_2020_3022950 crossref_primary_10_1109_TNNLS_2022_3148376 crossref_primary_10_1631_FITEE_2000446 crossref_primary_10_3390_en16176269 crossref_primary_10_1016_j_neucom_2025_129685 crossref_primary_10_1080_00207179_2022_2027523 crossref_primary_10_3390_drones3030072 crossref_primary_10_3390_robotics9030049 crossref_primary_10_1016_j_isatra_2021_12_017 crossref_primary_10_1016_j_ins_2023_01_030 crossref_primary_10_1016_j_neunet_2024_106858 crossref_primary_10_3390_app9071361 crossref_primary_10_1038_s41598_021_90000_4 crossref_primary_10_2139_ssrn_4133446 crossref_primary_10_1007_s00521_024_10852_5 crossref_primary_10_1016_j_comnet_2020_107556 crossref_primary_10_1002_acs_3738 crossref_primary_10_1002_rnc_7109 crossref_primary_10_1016_j_automatica_2022_110366 crossref_primary_10_1016_j_jmsy_2020_06_018 crossref_primary_10_1109_TNNLS_2021_3057438 crossref_primary_10_1016_j_jfranklin_2023_06_015 crossref_primary_10_1109_JIOT_2023_3288050 crossref_primary_10_1109_TNNLS_2018_2869896 crossref_primary_10_1109_TCYB_2021_3107801 crossref_primary_10_3390_robotics7040066 crossref_primary_10_1016_j_neucom_2024_128355 crossref_primary_10_1016_j_isatra_2022_12_011 crossref_primary_10_1109_COMST_2023_3323344 crossref_primary_10_1002_acs_3729 crossref_primary_10_1016_j_asoc_2020_106099 crossref_primary_10_1109_JAS_2022_105797 crossref_primary_10_1016_j_ejcon_2024_101043 crossref_primary_10_1109_TCSII_2023_3279309 crossref_primary_10_1109_TTE_2024_3400534 crossref_primary_10_1016_j_ast_2021_107279 crossref_primary_10_1109_TNNLS_2020_3021530 crossref_primary_10_1016_j_apenergy_2019_114193 crossref_primary_10_1109_TAC_2022_3172250 crossref_primary_10_1109_JIOT_2020_2996213 crossref_primary_10_1109_TSMC_2021_3089944 crossref_primary_10_1016_j_ins_2022_08_041 crossref_primary_10_1002_aisy_202200371 crossref_primary_10_1016_j_amc_2024_129068 crossref_primary_10_1016_j_neunet_2018_05_005 crossref_primary_10_1109_TSMC_2023_3312268 crossref_primary_10_1016_j_ins_2023_02_079 crossref_primary_10_1109_TCSI_2022_3151464 crossref_primary_10_1109_OJCSYS_2022_3209945 crossref_primary_10_1007_s11042_024_18732_9 crossref_primary_10_1109_TNNLS_2020_2978805 crossref_primary_10_1109_JAS_2023_123009 crossref_primary_10_1002_rnc_6169 crossref_primary_10_1016_j_neucom_2019_11_057 crossref_primary_10_1007_s10462_021_10118_9 crossref_primary_10_3390_machines10100856 crossref_primary_10_1109_LRA_2019_2930475
Cites_doi	10.1007/s11768-011-0178-0 10.1002/oca.2222 10.1145/500742.500765 10.1002/9780470182963 10.1016/j.engappai.2016.02.007 10.1155/2014/628798 10.1137/S0363012998332433 10.1016/j.ijepes.2014.06.057 10.1109/TAC.1981.1102603 10.1109/ACC.1994.735224 10.1109/CYBER.2012.6392582 10.1016/j.automatica.2014.05.011 10.1109/CDC.2016.7798300 10.1109/ACC.2016.7525383 10.1145/1121241.1121263 10.1126/science.aaa8415 10.1109/MCAS.2009.933854 10.1109/TNNLS.2013.2288067 10.1002/rnc.2814 10.1016/j.automatica.2015.10.039 10.1007/978-3-319-50815-3 10.1142/S2301385016400069 10.1016/j.automatica.2014.02.015 10.1109/TCYB.2014.2384016 10.1109/TNNLS.2016.2635586 10.1109/TEC.2016.2543229 10.1137/1.9780898719376 10.1109/TMECH.2012.2219880 10.1137/1.9780898718652 10.2307/j.ctvcm4g0s 10.1109/TSMCB.2006.883869 10.1109/ICIA.2006.305870 10.1007/BF00115009 10.1007/BF00992698 10.1016/j.neucom.2014.08.030 10.1109/TNNLS.2016.2586303 10.1109/TSMC.1983.6313077 10.1109/JAS.2014.7004686 10.1007/978-1-4471-4757-2 10.1137/0305004 10.1016/j.automatica.2012.09.019 10.1109/TIE.2016.2630658 10.1007/978-1-4471-5574-4 10.1109/TSG.2016.2640184 10.1002/acs.2297 10.1016/j.automatica.2016.12.009 10.1109/TAC.2014.2317301 10.1109/ICARM.2016.7606926 10.1109/TNNLS.2014.2358227 10.1049/iet-cta.2015.0943 10.1109/TNNLS.2015.2441749 10.1016/j.jfranklin.2014.11.008 10.1016/j.automatica.2004.11.034 10.1177/0278364910371999 10.1016/j.robot.2004.03.004 10.1109/CDC.1989.70114 10.1109/ICMLC.2004.1380601 10.1109/CDC.2016.7799165 10.1016/j.automatica.2010.02.018 10.1109/JAS.2014.7004681 10.2514/1.G001154 10.1109/TSMCA.2002.804820 10.1109/CDC.2016.7799164 10.1162/089976600300015961 10.1016/j.automatica.2013.09.043 10.1016/j.automatica.2015.08.017 10.1109/TNNLS.2013.2294968 10.1016/j.automatica.2012.05.074 10.1109/TRO.2012.2210294 10.1002/0471459100 10.1016/j.pnsc.2008.03.006 10.1109/TCYB.2014.2319577 10.1109/TSMCB.2008.926614 10.1109/TASE.2014.2300532 10.1016/j.automatica.2015.06.001 10.1073/pnas.42.10.767 10.1109/TNNLS.2015.2388672 10.1016/j.automatica.2014.08.023 10.1016/j.automatica.2012.06.096 10.1016/j.automatica.2008.08.017 10.1109/TNNLS.2016.2541020 10.1109/9.29425 10.1109/TSMCB.2010.2043839 10.1007/978-0-387-69082-7 10.1016/j.jfranklin.2013.12.008 10.1109/9.256331 10.7763/IJET.2015.V7.835 10.1109/RiiSS.2013.6607932 10.1016/j.automatica.2006.09.019 10.1016/j.neunet.2009.03.008 10.1109/ACC.2010.5531586 10.1109/TNNLS.2014.2350835 10.1109/TNNLS.2015.2453320 10.1002/9781118122631 10.1016/j.ifacol.2016.07.127
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID	97E RIA RIE AAYXX CITATION NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8
DOI	10.1109/TNNLS.2017.2773458
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Aluminium Industry Abstracts Biotechnology Research Abstracts Calcium & Calcified Tissue Abstracts Ceramic Abstracts Chemoreception Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Neurosciences Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Materials Research Database ProQuest Computer Science Collection Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Materials Business File Aerospace Database Engineered Materials Abstracts Biotechnology Research Abstracts Chemoreception Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts Neurosciences Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Calcium & Calcified Tissue Abstracts Corrosion Abstracts MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed Materials Research Database
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2162-2388
EndPage	2062
ExternalDocumentID	29771662 10_1109_TNNLS_2017_2773458 8169685
Genre	orig-research Research Support, U.S. Gov't, Non-P.H.S Research Support, Non-U.S. Gov't Journal Article
GrantInformation_xml	– fundername: ONR grantid: N00014-17-1-2239 – fundername: U.S. NSF grantid: ECCS-1405173 – fundername: NATO through the Virginia Tech Startup Fund grantid: SPS G5176 – fundername: China NSFC grantid: 61633007
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION RIG NPM 7QF 7QO 7QP 7QQ 7QR 7SC 7SE 7SP 7SR 7TA 7TB 7TK 7U5 8BQ 8FD F28 FR3 H8D JG9 JQ2 KR7 L7M L~C L~D P64 7X8
ID	FETCH-LOGICAL-c417t-9083b5d109310dc61fba723c8167da5469069a671b8e41f536eb1f4b632f11a03
IEDL.DBID	RIE
ISSN	2162-237X 2162-2388
IngestDate	Fri Jul 11 11:31:29 EDT 2025 Mon Jun 30 06:37:54 EDT 2025 Mon Jul 21 05:45:03 EDT 2025 Tue Jul 01 00:27:26 EDT 2025 Thu Apr 24 22:54:43 EDT 2025 Wed Aug 27 02:50:21 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Issue	6
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c417t-9083b5d109310dc61fba723c8167da5469069a671b8e41f536eb1f4b632f11a03
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0003-4074-1615 0000-0003-1978-4848 0000-0002-9701-8375 0000-0003-0800-5140
PMID	29771662
PQID	2174547689
PQPubID	85436
PageCount	21
ParticipantIDs	proquest_miscellaneous_2041625740 pubmed_primary_29771662 proquest_journals_2174547689 crossref_primary_10_1109_TNNLS_2017_2773458 ieee_primary_8169685 crossref_citationtrail_10_1109_TNNLS_2017_2773458
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2018-06-01
PublicationDateYYYYMMDD	2018-06-01
PublicationDate_xml	– month: 06 year: 2018 text: 2018-06-01 day: 01
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: Piscataway
PublicationTitle	IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev	TNNLS
PublicationTitleAlternate	IEEE Trans Neural Netw Learn Syst
PublicationYear	2018
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref57 lagoudakis (ref114) 2003; 4 ref56 ref59 ref58 aström (ref25) 2013 ref53 ref52 ref55 ref54 walters (ref97) 2013 ref51 ref50 robins (ref106) 2004 ref46 ref45 ref48 ref47 ref42 ref41 ref44 murphy (ref13) 2012 hastie (ref12) 2009 ref49 ref8 ref100 ref101 sutton (ref14) 2017; 1 ref40 ref35 ref34 stengel (ref2) 1986 ref36 bertsekas (ref15) 1996 sastry (ref24) 2011 ref33 ref39 mceneaney (ref7) 2006 ref38 werbos (ref30) 1991 vrabie (ref87) 2013 ref23 ref22 werbos (ref31) 1992 ref21 ref28 ref27 ref29 qiao (ref117) 2008 watkins (ref32) 1989 ioannou (ref26) 2013 bryson (ref6) 1975 ref96 ref124 ref99 ref125 ref98 ref10 ref17 ref16 ref19 howard (ref43) 1960 jain (ref104) 2013 bishop (ref11) 2006 hagan (ref9) 2014 ref93 ref92 ba?ar (ref63) 1995 ref95 ref94 ref91 ref90 ref89 ref86 ref85 ref88 modares (ref84) 2015 liberzon (ref3) 2011 heydari (ref37) 2014; 25 zhang (ref18) 2013 krsti? (ref20) 1995 ref82 levine (ref118) 2016; 17 ref81 ref83 ref80 ref79 ref108 ref78 ref109 kirk (ref5) 2012 ref107 ref75 ref105 ref77 ref102 ref76 ref103 vamvoudakis (ref74) 2014; 1 ref1 ref71 ref111 ref70 ref112 ref73 ref72 ref110 ref68 huang (ref115) 2005; 1 ref67 ref69 ref64 ref116 ref66 ref113 ref65 schulman (ref119) 2015 athans (ref4) 2006 ref60 ref122 ref123 ref62 ref120 ref61 ref121
References_xml	– ident: ref47 doi: 10.1007/s11768-011-0178-0 – ident: ref99 doi: 10.1002/oca.2222 – ident: ref103 doi: 10.1145/500742.500765 – ident: ref16 doi: 10.1002/9780470182963 – ident: ref123 doi: 10.1016/j.engappai.2016.02.007 – ident: ref95 doi: 10.1155/2014/628798 – ident: ref8 doi: 10.1137/S0363012998332433 – year: 1991 ident: ref30 publication-title: A Menu of Design for Reinforcement Learning Over Time – ident: ref120 doi: 10.1016/j.ijepes.2014.06.057 – ident: ref61 doi: 10.1109/TAC.1981.1102603 – ident: ref53 doi: 10.1109/ACC.1994.735224 – ident: ref94 doi: 10.1109/CYBER.2012.6392582 – ident: ref83 doi: 10.1016/j.automatica.2014.05.011 – ident: ref101 doi: 10.1109/CDC.2016.7798300 – ident: ref78 doi: 10.1109/ACC.2016.7525383 – year: 2006 ident: ref4 publication-title: Optimal Control An Introduction to the Theory and Its Applications – ident: ref108 doi: 10.1145/1121241.1121263 – ident: ref10 doi: 10.1126/science.aaa8415 – ident: ref55 doi: 10.1109/MCAS.2009.933854 – volume: 1 year: 2017 ident: ref14 publication-title: Reinforcement Learning An Introduction – volume: 25 start-page: 1106 year: 2014 ident: ref37 article-title: Optimal switching and control of nonlinear switching systems using approximate dynamic programming publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2013.2288067 – ident: ref90 doi: 10.1002/rnc.2814 – ident: ref71 doi: 10.1016/j.automatica.2015.10.039 – ident: ref19 doi: 10.1007/978-3-319-50815-3 – ident: ref86 doi: 10.1142/S2301385016400069 – ident: ref58 doi: 10.1016/j.automatica.2014.02.015 – year: 2013 ident: ref26 publication-title: Robust Adaptive Control – year: 1975 ident: ref6 publication-title: Applied Optimal Control Optimization Estimation and Control – ident: ref60 doi: 10.1109/TCYB.2014.2384016 – ident: ref122 doi: 10.1109/TNNLS.2016.2635586 – ident: ref124 doi: 10.1109/TEC.2016.2543229 – ident: ref23 doi: 10.1137/1.9780898719376 – ident: ref111 doi: 10.1109/TMECH.2012.2219880 – ident: ref22 doi: 10.1137/1.9780898718652 – year: 2011 ident: ref3 publication-title: Calculus of Variations and Optimal Control Theory A Concise Introduction doi: 10.2307/j.ctvcm4g0s – volume: 1 start-page: 85 year: 2005 ident: ref115 article-title: Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance publication-title: Proc Int Conf Mach Learn Cybern – ident: ref46 doi: 10.1109/TSMCB.2006.883869 – ident: ref116 doi: 10.1109/ICIA.2006.305870 – ident: ref28 doi: 10.1007/BF00115009 – ident: ref54 doi: 10.1007/BF00992698 – ident: ref40 doi: 10.1016/j.neucom.2014.08.030 – ident: ref75 doi: 10.1109/TNNLS.2016.2586303 – ident: ref27 doi: 10.1109/TSMC.1983.6313077 – volume: 1 start-page: 282 year: 2014 ident: ref74 article-title: Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems publication-title: IEEE/CAA Journal of Automatica Sinica doi: 10.1109/JAS.2014.7004686 – year: 2013 ident: ref18 publication-title: Adaptive Dynamic Programming for Control doi: 10.1007/978-1-4471-4757-2 – start-page: 575 year: 2013 ident: ref104 article-title: Learning trajectory preferences for manipulators via iterative improvement publication-title: Proc Adv Neural Inf Process Syst – year: 1996 ident: ref15 publication-title: Neuro-Dynamic Programming – ident: ref42 doi: 10.1137/0305004 – year: 2015 ident: ref84 article-title: Optimal tracking control of uncertain systems: On-policy and off-policy reinforcement learning approaches – ident: ref68 doi: 10.1016/j.automatica.2012.09.019 – ident: ref121 doi: 10.1109/TIE.2016.2630658 – ident: ref44 doi: 10.1007/978-1-4471-5574-4 – start-page: 784 year: 2008 ident: ref117 article-title: Application of reinforcement learning based on neural network to dynamic obstacle avoidance publication-title: Proc Int Conf Inf Autom – ident: ref125 doi: 10.1109/TSG.2016.2640184 – ident: ref73 doi: 10.1002/acs.2297 – ident: ref66 doi: 10.1016/j.automatica.2016.12.009 – ident: ref82 doi: 10.1109/TAC.2014.2317301 – year: 1995 ident: ref20 publication-title: Nonlinear and Adaptive Control Design – ident: ref100 doi: 10.1109/ICARM.2016.7606926 – ident: ref57 doi: 10.1109/TNNLS.2014.2358227 – ident: ref76 doi: 10.1049/iet-cta.2015.0943 – ident: ref88 doi: 10.1109/TNNLS.2015.2441749 – ident: ref39 doi: 10.1016/j.jfranklin.2014.11.008 – year: 1989 ident: ref32 article-title: Learning from delayed rewards – ident: ref34 doi: 10.1016/j.automatica.2004.11.034 – ident: ref109 doi: 10.1177/0278364910371999 – ident: ref107 doi: 10.1016/j.robot.2004.03.004 – ident: ref29 doi: 10.1109/CDC.1989.70114 – year: 2006 ident: ref7 publication-title: Max-Plus Methods for Nonlinear Control and Estimation – year: 2015 ident: ref119 publication-title: Trust region policy optimization – ident: ref113 doi: 10.1109/ICMLC.2004.1380601 – year: 1986 ident: ref2 publication-title: Optimal Control and Estimation – ident: ref77 doi: 10.1109/CDC.2016.7799165 – ident: ref36 doi: 10.1016/j.automatica.2010.02.018 – year: 1960 ident: ref43 publication-title: Dynamic Programming and Markov Processes – ident: ref70 doi: 10.1109/JAS.2014.7004681 – ident: ref50 doi: 10.2514/1.G001154 – ident: ref105 doi: 10.1109/TSMCA.2002.804820 – ident: ref102 doi: 10.1109/CDC.2016.7799164 – ident: ref33 doi: 10.1162/089976600300015961 – year: 2013 ident: ref87 publication-title: Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles – ident: ref72 doi: 10.1016/j.automatica.2013.09.043 – ident: ref91 doi: 10.1016/j.automatica.2015.08.017 – year: 2011 ident: ref24 publication-title: Adaptive Control Stability Convergence and Robustness – start-page: 225 year: 2004 ident: ref106 publication-title: Effects of Repeated Exposure to A Humanoid Robot on Children with Autism – ident: ref79 doi: 10.1109/TNNLS.2013.2294968 – ident: ref93 doi: 10.1016/j.automatica.2012.05.074 – ident: ref110 doi: 10.1109/TRO.2012.2210294 – volume: 17 start-page: 1 year: 2016 ident: ref118 article-title: End-to-end training of deep visuomotor policies publication-title: J Mach Learn Res – ident: ref21 doi: 10.1002/0471459100 – year: 2014 ident: ref9 publication-title: Neural Network Design – ident: ref48 doi: 10.1016/j.pnsc.2008.03.006 – ident: ref80 doi: 10.1109/TCYB.2014.2319577 – year: 2013 ident: ref25 publication-title: Adaptive Control – ident: ref45 doi: 10.1109/TSMCB.2008.926614 – ident: ref89 doi: 10.1109/TASE.2014.2300532 – ident: ref49 doi: 10.1016/j.automatica.2015.06.001 – ident: ref59 doi: 10.1073/pnas.42.10.767 – ident: ref41 doi: 10.1109/TNNLS.2015.2388672 – year: 2006 ident: ref11 publication-title: Pattern Recognition and Machine Learning – ident: ref85 doi: 10.1016/j.automatica.2014.08.023 – year: 2009 ident: ref12 publication-title: The Elements of Statistical Learning Data Mining Inference and Prediction – ident: ref81 doi: 10.1016/j.automatica.2012.06.096 – ident: ref35 doi: 10.1016/j.automatica.2008.08.017 – ident: ref51 doi: 10.1109/TNNLS.2016.2541020 – year: 2012 ident: ref5 publication-title: Optimal Control Theory An Introduction – ident: ref64 doi: 10.1109/9.29425 – ident: ref56 doi: 10.1109/TSMCB.2010.2043839 – ident: ref17 doi: 10.1007/978-0-387-69082-7 – ident: ref38 doi: 10.1016/j.jfranklin.2013.12.008 – ident: ref62 doi: 10.1109/9.256331 – year: 1995 ident: ref63 publication-title: $H^\infty$ -Optimal Control and Related Minimax Design Problems – ident: ref96 doi: 10.7763/IJET.2015.V7.835 – year: 2012 ident: ref13 publication-title: Machine Learning A Probabilistic Perspective – year: 1992 ident: ref31 article-title: Approximate dynamic programming for real-time control and neural modeling publication-title: Handbook of Intelligent Control Neural Fuzzy and Adaptive Approaches – ident: ref98 doi: 10.1109/RiiSS.2013.6607932 – volume: 4 start-page: 1107 year: 2003 ident: ref114 article-title: Least-squares policy iteration publication-title: J Mach Learn Res – ident: ref65 doi: 10.1016/j.automatica.2006.09.019 – ident: ref67 doi: 10.1016/j.neunet.2009.03.008 – ident: ref69 doi: 10.1109/ACC.2010.5531586 – ident: ref92 doi: 10.1109/TNNLS.2014.2350835 – year: 2013 ident: ref97 publication-title: Online approximate optimal station keeping of an autonomous underwater vehicle – ident: ref52 doi: 10.1109/TNNLS.2015.2453320 – ident: ref1 doi: 10.1002/9781118122631 – ident: ref112 doi: 10.1016/j.ifacol.2016.07.127
SSID	ssj0000605649
Score	2.6886592
Snippet	This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	2042
SubjectTerms	Algorithm design and analysis Algorithms Approximation algorithms Autonomy Computer & video games data-based optimization Feedback control Games H-infinity control Heuristic algorithms Learning Learning (artificial intelligence) Machine learning Multiagent systems Optimal control Reinforcement reinforcement learning (RL) State-of-the-art reviews System dynamics
Title	Optimal and Autonomous Control Using Reinforcement Learning: A Survey
URI	https://ieeexplore.ieee.org/document/8169685 https://www.ncbi.nlm.nih.gov/pubmed/29771662 https://www.proquest.com/docview/2174547689 https://www.proquest.com/docview/2041625740
Volume	29
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJy6lBdqmpZWRuJUs8SN20tsKgVAFW6mAtLcoccY9ANkKkkrl1zN2HqqqFvUWKY5jz3g839jzADhwKFyNngMZilhlro7JbJYxr5whi8vZNES9Xyz02bX6skyXa3A4xcIgYnA-w5l_DHf59cp2_qjsKOM-lUu6DuvUTR-rNZ2nJITLdUC7gmsRC2mWY4xMkh9dLRbnl96Ry8yEMVL5Cu-_6aFQWOXfGDPomtMtuBhH2buY3My6tprZxz8SOP7vNF7CiwF0snm_Sl7BGjbbsDUWdGCDfO_AyVfaQO6oZdnUbN61PuBh1T2w496fnQX_AvYNQ7ZVGw4W2ZCg9ftnNmeX3f1P_LUL16cnV8dn8VBnIbaKmzbOCYZVae0TS_Gktpq7qjRCWhqnqcvUG9A6L7XhVYaKu1Rq2uCdqrQUjvMyka9ho1k1-BaYI7yDUqKwTpPaS8qMVCRyRXJu6qwqI-Aj1Qs7JCH3tTBui2CMJHkROFV4ThUDpyL4NH3zo0_B8WzrHU_xqeVA7Aj2RuYWg5Q-FN4cSxUZXHkE-9Nrki9_aVI2SBSmvgmy0r6mkgje9Iti6lsQeOZai3d__-d72KSRZb1j2R5stPcdfiAI01Yfw9p9AlFj6fk
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwEB6VcoALBcojpYCR4ISyjR-xk0ocVqXVlm4XiW6lvaWJY3MAsqibFJXfwl_pf2PsPIQQcKvELVIcJ_GMx99nzwPgpTXMlsZJIDEsFIktQ6TNPKSFVci4rI591PvxTE5OxbtFvFiDH0MsjDHGO5-Zkbv0Z_nlUjduq2wnoS6VS-9CeWQuvyFBW705fIvSfMXYwf58bxJ2NQRCLaiqwxQhRhGXLmkSjUotqS1yxbjGrlSZx44cyjSXihaJEdTGXKLxsqKQnFlK84hjvzfgJuKMmLXRYcMOToRMQHp8zahkIeNq0UflROnOfDabnjjXMTViSnHhasr_svL5Ui5_R7V-dTvYgKt-XFqnlk-jpi5G-vtvKSP_14G7C3c6WE3G7Ty4B2umug8bfckK0lmwTdh_jybyC7bMq5KMm9qFdCybFdlrPfaJ96AgH4zPJ6v91inpUtB-3CVjctKcX5jLB3B6LT_zENarZWUeA7GI6AznhmkrcWGP8gRBgKECLZkqkyIPgPZSznSXZt1V-_iceboVpZnXjMxpRtZpRgCvh2e-tklG_tl600l4aNkJN4DtXpmyzg6tMkc4Y4GUMg3gxXAbLYg7FsorgyOMfSMoR8stogAetUo49M2QHlAp2daf3_kcbk3mx9Nsejg7egK38SuT1o1uG9br88Y8RcBWF8_8vCFwdt369hOvz0TA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimal+and+Autonomous+Control+Using+Reinforcement+Learning%3A+A+Survey&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Kiumarsi%2C+Bahare&rft.au=Vamvoudakis%2C+Kyriakos+G.&rft.au=Modares%2C+Hamidreza&rft.au=Lewis%2C+Frank+L.&rft.date=2018-06-01&rft.pub=IEEE&rft.issn=2162-237X&rft.volume=29&rft.issue=6&rft.spage=2042&rft.epage=2062&rft_id=info:doi/10.1109%2FTNNLS.2017.2773458&rft.externalDocID=8169685
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon