Optimal and Autonomous Control Using Reinforcement Learning: A Survey

This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_{2...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 29; no. 6; pp. 2042 - 2062
Main Authors Kiumarsi, Bahare, Vamvoudakis, Kyriakos G., Modares, Hamidreza, Lewis, Frank L.
Format Journal Article
LanguageEnglish
Published United States IEEE 01.06.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_{2} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_\infty </tex-math></inline-formula> control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
AbstractList This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_{2} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathcal {H}_\infty </tex-math></inline-formula> control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal and control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single and multiagent systems. Existing RL solutions to both optimal H2 and H∞ control problems, as well as graphical games, will be reviewed. RL methods learn the solution to optimal control and game problems online and using measured data along the system trajectories. We discuss Q-learning and the integral RL algorithm as core algorithms for discrete-time (DT) and continuous-time (CT) systems, respectively. Moreover, we discuss a new direction of off-policy RL for both CT and DT systems. Finally, we review several applications.
Author Vamvoudakis, Kyriakos G.
Lewis, Frank L.
Modares, Hamidreza
Kiumarsi, Bahare
Author_xml – sequence: 1
  givenname: Bahare
  orcidid: 0000-0002-9701-8375
  surname: Kiumarsi
  fullname: Kiumarsi, Bahare
  email: b_kiomarsi@yahoo.com
  organization: UTA Research Institute, University of Texas at Arlington, Arlington, TX, USA
– sequence: 2
  givenname: Kyriakos G.
  orcidid: 0000-0003-1978-4848
  surname: Vamvoudakis
  fullname: Vamvoudakis, Kyriakos G.
  email: kyriakos@vt.edu
  organization: Kevin T. Crofton Department of Aerospace and Ocean Engineering, Virginia Tech, Blacksburg, VA, USA
– sequence: 3
  givenname: Hamidreza
  orcidid: 0000-0003-0800-5140
  surname: Modares
  fullname: Modares, Hamidreza
  email: modaresh@mst.edu
  organization: Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, USA
– sequence: 4
  givenname: Frank L.
  orcidid: 0000-0003-4074-1615
  surname: Lewis
  fullname: Lewis, Frank L.
  email: lewis@uta.edu
  organization: UTA Research Institute, University of Texas at Arlington, Arlington, TX, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29771662$$D View this record in MEDLINE/PubMed
BookMark eNp9kU9L7DAUxYMo6lO_gIIU3LzNjLlJmqTuhkGfwqDgH3AX0vZWKm0yJq3gtzfzZnThwmwSwu9czj3nD9l23iEhx0CnALQ4f7y9XTxMGQU1ZUpxkestss9AsgnjWm9_v9XzHjmK8ZWmI2kuRbFL9lihFEjJ9snl3XJoe9tl1tXZbBy8870fYzb3bgi-y55i616ye2xd40OFPbohW6ANLn1fZLPsYQzv-HFIdhrbRTza3Afk6erycX49Wdz9u5nPFpNKgBomBdW8zOtknwOtKwlNaRXjlQapapsLWVBZWKmg1CigybnEEhpRSs4aAEv5Afm7nrsM_m3EOJi-jRV2nXWYXBtGRdo6V2KFnv1AX_0YXHJnGCiRCyV1kajTDTWWPdZmGVIY4cN8BZQAvQaq4GMM2JiqHezQruKxbWeAmlUd5n8dZlWH2dSRpOyH9Gv6r6KTtahFxG9BCqiQOuef_raTJg
CODEN ITNNAL
CitedBy_id crossref_primary_10_1016_j_jfranklin_2019_12_017
crossref_primary_10_1002_rnc_4962
crossref_primary_10_1007_s11432_023_3818_3
crossref_primary_10_1109_ACCESS_2020_3042994
crossref_primary_10_1109_ACCESS_2019_2931884
crossref_primary_10_1109_TNNLS_2021_3071548
crossref_primary_10_1016_j_automatica_2021_109860
crossref_primary_10_1109_TNNLS_2018_2885530
crossref_primary_10_1109_TSMC_2023_3306338
crossref_primary_10_1016_j_jfranklin_2022_12_028
crossref_primary_10_1109_TIA_2024_3351800
crossref_primary_10_1007_s11424_020_9265_y
crossref_primary_10_1109_TNNLS_2022_3152268
crossref_primary_10_1016_j_automatica_2023_111261
crossref_primary_10_1109_TCYB_2018_2884315
crossref_primary_10_1002_rnc_4730
crossref_primary_10_1016_j_asoc_2021_107153
crossref_primary_10_1109_TCYB_2022_3233593
crossref_primary_10_1016_j_ifacol_2021_10_381
crossref_primary_10_1109_LCSYS_2020_2979572
crossref_primary_10_3390_app13031874
crossref_primary_10_1016_j_asoc_2024_112203
crossref_primary_10_1109_TNNLS_2018_2871361
crossref_primary_10_1002_asjc_3609
crossref_primary_10_1016_j_phycom_2022_101799
crossref_primary_10_1109_ACCESS_2019_2941229
crossref_primary_10_1016_j_neucom_2025_129986
crossref_primary_10_1007_s11432_023_3982_y
crossref_primary_10_1016_j_ejcon_2022_100633
crossref_primary_10_1109_TNNLS_2023_3264815
crossref_primary_10_1007_s11042_022_13000_0
crossref_primary_10_1109_TCYB_2022_3202864
crossref_primary_10_1109_TNNLS_2022_3143527
crossref_primary_10_1007_s00521_019_04372_w
crossref_primary_10_1109_LCSYS_2023_3343439
crossref_primary_10_1109_TPWRS_2019_2931293
crossref_primary_10_1109_TSMC_2021_3050960
crossref_primary_10_1016_j_neucom_2025_129977
crossref_primary_10_1080_01691864_2023_2229886
crossref_primary_10_1109_ACCESS_2020_3025194
crossref_primary_10_1080_00207721_2023_2221240
crossref_primary_10_1016_j_ejcon_2024_101093
crossref_primary_10_1109_TPDS_2021_3092270
crossref_primary_10_1109_TNNLS_2019_2955699
crossref_primary_10_1109_TAI_2024_3419757
crossref_primary_10_1002_rnc_5826
crossref_primary_10_1016_j_chaos_2022_112535
crossref_primary_10_1016_j_epsr_2023_109945
crossref_primary_10_1109_TNSE_2022_3185019
crossref_primary_10_1016_j_asoc_2024_111582
crossref_primary_10_1002_rnc_5828
crossref_primary_10_1016_j_cja_2021_11_018
crossref_primary_10_1177_10775463241307703
crossref_primary_10_1007_s10922_022_09667_3
crossref_primary_10_1109_ACCESS_2019_2960064
crossref_primary_10_1109_TSMC_2023_3305498
crossref_primary_10_1109_TNNLS_2019_2899311
crossref_primary_10_1109_TAI_2022_3187951
crossref_primary_10_1016_j_array_2022_100142
crossref_primary_10_1007_s00202_023_01875_7
crossref_primary_10_1016_j_automatica_2023_111468
crossref_primary_10_3390_ma15144825
crossref_primary_10_1002_acs_3326
crossref_primary_10_1109_ACCESS_2020_3013032
crossref_primary_10_1016_j_ifacol_2021_04_181
crossref_primary_10_1109_TCYB_2021_3060736
crossref_primary_10_1109_TWC_2021_3082986
crossref_primary_10_1016_j_ifacol_2021_08_365
crossref_primary_10_1109_TVT_2023_3254604
crossref_primary_10_1109_TAC_2023_3274629
crossref_primary_10_1016_j_asoc_2024_112417
crossref_primary_10_3390_s24248109
crossref_primary_10_3390_s20082320
crossref_primary_10_1109_ACCESS_2019_2929120
crossref_primary_10_1109_TNNLS_2023_3280161
crossref_primary_10_3390_fractalfract8020099
crossref_primary_10_1109_TCYB_2020_3029077
crossref_primary_10_1016_j_simpat_2024_102962
crossref_primary_10_1109_TNNLS_2020_2967871
crossref_primary_10_1002_rnc_4911
crossref_primary_10_1177_00202940211007177
crossref_primary_10_1109_TIM_2023_3282297
crossref_primary_10_1109_TVT_2020_3019687
crossref_primary_10_1016_j_eswa_2023_121880
crossref_primary_10_1103_PhysRevB_107_235139
crossref_primary_10_1016_j_knosys_2022_109448
crossref_primary_10_1016_j_jfranklin_2021_11_009
crossref_primary_10_1109_JSAC_2021_3087227
crossref_primary_10_1109_ACCESS_2019_2923845
crossref_primary_10_1109_JIOT_2020_3015042
crossref_primary_10_1007_s10489_023_04867_z
crossref_primary_10_1016_j_eswa_2023_119770
crossref_primary_10_1007_s00422_022_00922_z
crossref_primary_10_1109_TC_2021_3072072
crossref_primary_10_1109_LRA_2019_2931179
crossref_primary_10_1109_TNNLS_2020_3017461
crossref_primary_10_1109_JAS_2023_123096
crossref_primary_10_1109_TNNLS_2022_3201705
crossref_primary_10_3390_e25121570
crossref_primary_10_1109_JIOT_2023_3337109
crossref_primary_10_1016_j_neunet_2022_11_012
crossref_primary_10_1002_acs_3115
crossref_primary_10_1002_acs_3234
crossref_primary_10_1109_TNNLS_2022_3185055
crossref_primary_10_1109_TGCN_2023_3268208
crossref_primary_10_26599_AIR_2022_9150007
crossref_primary_10_1109_JIOT_2020_3004394
crossref_primary_10_1080_01969722_2020_1758466
crossref_primary_10_1016_j_automatica_2024_111551
crossref_primary_10_3390_robotics11050085
crossref_primary_10_1016_j_neucom_2024_129185
crossref_primary_10_1016_j_sysconle_2020_104847
crossref_primary_10_1109_TNNLS_2019_2899594
crossref_primary_10_1016_j_automatica_2023_111490
crossref_primary_10_1109_TAI_2024_3433614
crossref_primary_10_1109_TNNLS_2021_3088947
crossref_primary_10_1109_TSMC_2024_3390768
crossref_primary_10_1007_s00466_023_02335_6
crossref_primary_10_1109_TNNLS_2023_3301383
crossref_primary_10_1080_00207721_2022_2085343
crossref_primary_10_1002_oca_3058
crossref_primary_10_1109_TNNLS_2019_2900592
crossref_primary_10_1007_s40435_021_00836_x
crossref_primary_10_1109_ACCESS_2024_3448535
crossref_primary_10_1007_s10846_018_0832_6
crossref_primary_10_1103_PhysRevResearch_4_013221
crossref_primary_10_3103_S1060992X2401003X
crossref_primary_10_1007_s11424_022_2037_0
crossref_primary_10_1109_TNNLS_2023_3278729
crossref_primary_10_1002_rnc_5403
crossref_primary_10_1002_acs_3494
crossref_primary_10_1109_TCYB_2019_2946122
crossref_primary_10_1007_s11071_023_08909_6
crossref_primary_10_1016_j_isatra_2023_11_032
crossref_primary_10_1109_ACCESS_2024_3445143
crossref_primary_10_1109_TNNLS_2021_3098985
crossref_primary_10_3390_s25051416
crossref_primary_10_1049_cth2_12563
crossref_primary_10_1002_rnc_4322
crossref_primary_10_1007_s12555_019_0402_0
crossref_primary_10_1109_TASE_2023_3305615
crossref_primary_10_1007_s12555_020_0063_z
crossref_primary_10_1007_s12555_018_0489_8
crossref_primary_10_1002_rnc_7734
crossref_primary_10_1016_j_automatica_2021_109687
crossref_primary_10_1109_TCYB_2023_3324601
crossref_primary_10_1109_JIOT_2023_3342032
crossref_primary_10_1016_j_automatica_2023_110912
crossref_primary_10_1016_j_neucom_2020_04_095
crossref_primary_10_1016_j_robot_2019_103362
crossref_primary_10_17341_gazimmfd_875563
crossref_primary_10_1109_TSMC_2018_2837899
crossref_primary_10_3390_s24020700
crossref_primary_10_1109_LCSYS_2020_3001241
crossref_primary_10_1109_ACCESS_2024_3461756
crossref_primary_10_1109_TCNS_2021_3074256
crossref_primary_10_1002_rnc_7608
crossref_primary_10_1109_TIE_2022_3220886
crossref_primary_10_1016_j_neucom_2022_03_036
crossref_primary_10_1109_TNNLS_2019_2927869
crossref_primary_10_3390_math11040906
crossref_primary_10_1109_TNNLS_2022_3167688
crossref_primary_10_1109_TNNLS_2023_3244934
crossref_primary_10_1109_TNNLS_2021_3123444
crossref_primary_10_1007_s11768_019_8168_8
crossref_primary_10_1016_j_neunet_2019_04_026
crossref_primary_10_1016_j_neunet_2021_10_009
crossref_primary_10_1016_j_neucom_2021_10_046
crossref_primary_10_1016_j_ins_2024_121283
crossref_primary_10_1109_TAES_2021_3074134
crossref_primary_10_3390_s20226595
crossref_primary_10_1016_j_engappai_2025_110373
crossref_primary_10_1016_j_neunet_2024_106364
crossref_primary_10_1109_JAS_2023_123843
crossref_primary_10_3390_mi13030458
crossref_primary_10_1109_ACCESS_2022_3175828
crossref_primary_10_1016_j_ins_2021_03_043
crossref_primary_10_1109_TSMC_2023_3346949
crossref_primary_10_3390_e20090659
crossref_primary_10_1016_j_jfranklin_2019_05_020
crossref_primary_10_1049_iet_pel_2019_1339
crossref_primary_10_1109_TNNLS_2020_3044196
crossref_primary_10_1016_j_engappai_2018_07_004
crossref_primary_10_1109_ACCESS_2022_3208058
crossref_primary_10_1002_aaai_12087
crossref_primary_10_1109_TNNLS_2022_3191673
crossref_primary_10_1109_ACCESS_2020_3027152
crossref_primary_10_1080_00207721_2025_2474137
crossref_primary_10_1016_j_engappai_2022_105106
crossref_primary_10_1002_rnc_5847
crossref_primary_10_1007_s12190_023_01857_9
crossref_primary_10_3390_sym12040631
crossref_primary_10_1002_rnc_5729
crossref_primary_10_1016_j_robot_2020_103515
crossref_primary_10_1109_TCYB_2022_3198078
crossref_primary_10_1016_j_ast_2020_106442
crossref_primary_10_1177_0020294019830434
crossref_primary_10_1109_JIOT_2024_3401829
crossref_primary_10_3390_jsan8040057
crossref_primary_10_1016_j_isatra_2022_02_034
crossref_primary_10_1109_TSMC_2021_3129534
crossref_primary_10_1080_00207721_2020_1839142
crossref_primary_10_1016_j_cjpre_2022_09_004
crossref_primary_10_3390_drones6120378
crossref_primary_10_1088_1361_6501_ad7a18
crossref_primary_10_1109_TVT_2018_2871606
crossref_primary_10_1109_OJCSYS_2024_3368850
crossref_primary_10_1016_j_engappai_2022_105581
crossref_primary_10_1109_LCSYS_2024_3409671
crossref_primary_10_1109_TIE_2022_3192676
crossref_primary_10_1016_j_isatra_2024_11_007
crossref_primary_10_1109_TVT_2024_3426326
crossref_primary_10_1016_j_sysconle_2021_104983
crossref_primary_10_1109_ACCESS_2019_2891575
crossref_primary_10_1016_j_ifacol_2024_12_017
crossref_primary_10_1049_iet_pel_2019_0159
crossref_primary_10_1016_j_neucom_2024_127835
crossref_primary_10_1007_s11424_025_4572_y
crossref_primary_10_1109_TAC_2022_3155384
crossref_primary_10_1038_s42256_018_0010_3
crossref_primary_10_1109_LCSYS_2021_3072007
crossref_primary_10_3390_app11052312
crossref_primary_10_1016_j_neucom_2021_10_083
crossref_primary_10_3390_s22249867
crossref_primary_10_1016_j_arcontrol_2019_09_008
crossref_primary_10_1049_cth2_12661
crossref_primary_10_1109_TCSI_2024_3417257
crossref_primary_10_1109_TCYB_2025_3540967
crossref_primary_10_1016_j_ins_2022_03_004
crossref_primary_10_1016_j_automatica_2021_110058
crossref_primary_10_1007_s10462_023_10497_1
crossref_primary_10_1016_j_rser_2023_113877
crossref_primary_10_1109_TNNLS_2019_2897814
crossref_primary_10_1007_s10489_024_05720_7
crossref_primary_10_1109_TNNLS_2023_3245630
crossref_primary_10_32604_cmc_2023_039164
crossref_primary_10_1016_j_ifacol_2020_12_027
crossref_primary_10_1109_TMECH_2024_3376430
crossref_primary_10_1109_ACCESS_2022_3184801
crossref_primary_10_1016_j_neucom_2021_01_096
crossref_primary_10_1109_TCYB_2020_3028988
crossref_primary_10_3390_en15072374
crossref_primary_10_1002_rnc_6475
crossref_primary_10_1016_j_automatica_2022_110684
crossref_primary_10_1007_s12555_019_0120_7
crossref_primary_10_1016_j_prime_2024_100877
crossref_primary_10_1007_s12555_022_0745_9
crossref_primary_10_1142_S2301385023310027
crossref_primary_10_1007_s00170_021_07895_6
crossref_primary_10_1016_j_neucom_2020_06_083
crossref_primary_10_1007_s11042_020_09590_2
crossref_primary_10_1109_TNNLS_2021_3054402
crossref_primary_10_1109_ACCESS_2020_3000781
crossref_primary_10_1109_LCSYS_2024_3417178
crossref_primary_10_1109_TSMC_2023_3324215
crossref_primary_10_1145_3608479
crossref_primary_10_1109_JAS_2022_105992
crossref_primary_10_1109_TCYB_2019_2926248
crossref_primary_10_1016_j_conengprac_2021_105042
crossref_primary_10_1002_rnc_5132
crossref_primary_10_1002_rnc_6340
crossref_primary_10_1109_TII_2019_2925632
crossref_primary_10_1016_j_jfranklin_2024_106812
crossref_primary_10_1016_j_neucom_2023_03_045
crossref_primary_10_1109_TAC_2024_3422889
crossref_primary_10_1007_s00521_023_09244_y
crossref_primary_10_1016_j_arcontrol_2019_01_003
crossref_primary_10_1109_TSG_2021_3050419
crossref_primary_10_1146_annurev_control_042920_020211
crossref_primary_10_1016_j_neucom_2024_127869
crossref_primary_10_1109_TNNLS_2021_3112718
crossref_primary_10_1002_rnc_6372
crossref_primary_10_1016_j_mechmachtheory_2024_105676
crossref_primary_10_3390_su15065249
crossref_primary_10_1109_TAC_2020_2986211
crossref_primary_10_1002_int_22647
crossref_primary_10_3390_e25071101
crossref_primary_10_1002_rnc_7101
crossref_primary_10_1007_s10664_021_09941_z
crossref_primary_10_1016_j_asoc_2020_106665
crossref_primary_10_3390_math13020189
crossref_primary_10_1007_s12555_019_0165_7
crossref_primary_10_1109_TIE_2024_3366218
crossref_primary_10_1016_j_neucom_2024_128609
crossref_primary_10_1016_j_aei_2023_102328
crossref_primary_10_1109_TNNLS_2023_3245980
crossref_primary_10_1016_j_ifacol_2022_07_108
crossref_primary_10_1109_TETCI_2024_3361860
crossref_primary_10_1109_TIA_2023_3300290
crossref_primary_10_1109_TNNLS_2021_3138924
crossref_primary_10_24193_subbtref_67_2_02
crossref_primary_10_1007_s10462_021_09997_9
crossref_primary_10_1002_rnc_7451
crossref_primary_10_1109_TNNLS_2021_3085358
crossref_primary_10_1142_S0219649224500801
crossref_primary_10_1109_TCYB_2020_3006871
crossref_primary_10_1016_j_eswa_2023_121070
crossref_primary_10_1080_18824889_2023_2278753
crossref_primary_10_1007_s10489_024_05733_2
crossref_primary_10_1109_TNSE_2022_3211193
crossref_primary_10_1016_j_cja_2019_10_005
crossref_primary_10_1002_rnc_5341
crossref_primary_10_1080_00207179_2018_1503724
crossref_primary_10_1109_LCSYS_2022_3184647
crossref_primary_10_1109_TSMC_2024_3404147
crossref_primary_10_1016_j_automatica_2022_110761
crossref_primary_10_1109_TASE_2024_3359219
crossref_primary_10_1016_j_ifacol_2022_09_395
crossref_primary_10_1109_TCYB_2020_2978088
crossref_primary_10_1016_j_comcom_2021_04_025
crossref_primary_10_1177_09544100241278023
crossref_primary_10_1109_TNNLS_2020_3026010
crossref_primary_10_3390_aerospace12010030
crossref_primary_10_1109_TNNLS_2023_3303811
crossref_primary_10_1109_TITS_2023_3292967
crossref_primary_10_1109_TSMC_2024_3428482
crossref_primary_10_3934_mbe_2023274
crossref_primary_10_1016_j_automatica_2025_112168
crossref_primary_10_1109_JIOT_2020_2993012
crossref_primary_10_1016_j_oceaneng_2021_109794
crossref_primary_10_1007_s10489_023_04574_9
crossref_primary_10_1016_j_ifacol_2023_10_1251
crossref_primary_10_1109_TIE_2023_3327574
crossref_primary_10_1109_TSMC_2024_3417230
crossref_primary_10_1080_00207721_2024_2312886
crossref_primary_10_1007_s11071_021_07049_z
crossref_primary_10_1007_s10846_022_01584_6
crossref_primary_10_1016_j_foar_2022_10_003
crossref_primary_10_1109_TNNLS_2023_3333551
crossref_primary_10_2196_18477
crossref_primary_10_1109_ACCESS_2021_3076538
crossref_primary_10_1109_TSMC_2023_3247888
crossref_primary_10_1109_TASE_2022_3216217
crossref_primary_10_1002_rnc_6213
crossref_primary_10_1016_j_asr_2022_09_034
crossref_primary_10_1109_JAS_2023_123651
crossref_primary_10_3390_machines13030186
crossref_primary_10_1109_TII_2019_2953932
crossref_primary_10_1016_j_ast_2021_107204
crossref_primary_10_1109_JAS_2021_1004353
crossref_primary_10_1109_TNNLS_2023_3340741
crossref_primary_10_1039_D3LC01012K
crossref_primary_10_1016_j_neucom_2024_128411
crossref_primary_10_1109_TAC_2022_3181248
crossref_primary_10_1109_TNNLS_2019_2957287
crossref_primary_10_1016_j_neucom_2025_129363
crossref_primary_10_1002_rnc_5350
crossref_primary_10_1016_j_scs_2021_102822
crossref_primary_10_1088_1742_6596_1449_1_012058
crossref_primary_10_61186_joc_16_4_57
crossref_primary_10_3390_robotics8040082
crossref_primary_10_1016_j_neucom_2024_128418
crossref_primary_10_1109_TIE_2022_3204966
crossref_primary_10_1109_TNNLS_2021_3070852
crossref_primary_10_1002_rnc_6191
crossref_primary_10_1016_j_engappai_2024_108430
crossref_primary_10_1016_j_eswa_2023_119910
crossref_primary_10_1109_ACCESS_2021_3069210
crossref_primary_10_1002_acs_2949
crossref_primary_10_1109_ACCESS_2021_3061729
crossref_primary_10_1109_TNNLS_2021_3137524
crossref_primary_10_1002_acs_3919
crossref_primary_10_1109_TNNLS_2021_3136554
crossref_primary_10_1016_j_automatica_2025_112197
crossref_primary_10_1109_TSMC_2023_3298217
crossref_primary_10_1021_acsestengg_2c00156
crossref_primary_10_1109_JPROC_2023_3303358
crossref_primary_10_1016_j_oceaneng_2022_112742
crossref_primary_10_1080_18824889_2023_2167540
crossref_primary_10_1109_TAC_2023_3339660
crossref_primary_10_1016_j_neucom_2024_128677
crossref_primary_10_16984_saufenbilder_1286391
crossref_primary_10_1109_LRA_2023_3332556
crossref_primary_10_1038_s41467_023_41379_3
crossref_primary_10_1002_rnc_7278
crossref_primary_10_1109_TAES_2021_3094628
crossref_primary_10_1109_LCSYS_2020_3041218
crossref_primary_10_1109_TSMC_2019_2933152
crossref_primary_10_1109_TRO_2019_2929014
crossref_primary_10_1007_s11432_022_3702_4
crossref_primary_10_1109_TNNLS_2019_2955857
crossref_primary_10_1109_TCYB_2019_2939487
crossref_primary_10_3390_aerospace10110951
crossref_primary_10_1109_TSMC_2020_3042876
crossref_primary_10_1109_TNNLS_2022_3213566
crossref_primary_10_1016_j_matcom_2021_06_023
crossref_primary_10_1109_JAS_2024_124323
crossref_primary_10_1109_TSMC_2024_3505945
crossref_primary_10_1016_j_automatica_2023_111332
crossref_primary_10_1007_s10462_023_10641_x
crossref_primary_10_3390_s23104962
crossref_primary_10_1109_TIV_2023_3282681
crossref_primary_10_1109_TNNLS_2020_3006080
crossref_primary_10_1007_s40435_021_00776_6
crossref_primary_10_1109_TNNLS_2020_2976787
crossref_primary_10_1109_TNNLS_2020_3022950
crossref_primary_10_1109_TNNLS_2022_3148376
crossref_primary_10_1631_FITEE_2000446
crossref_primary_10_3390_en16176269
crossref_primary_10_1016_j_neucom_2025_129685
crossref_primary_10_1080_00207179_2022_2027523
crossref_primary_10_3390_drones3030072
crossref_primary_10_3390_robotics9030049
crossref_primary_10_1016_j_isatra_2021_12_017
crossref_primary_10_1016_j_ins_2023_01_030
crossref_primary_10_1016_j_neunet_2024_106858
crossref_primary_10_3390_app9071361
crossref_primary_10_1038_s41598_021_90000_4
crossref_primary_10_2139_ssrn_4133446
crossref_primary_10_1007_s00521_024_10852_5
crossref_primary_10_1016_j_comnet_2020_107556
crossref_primary_10_1002_acs_3738
crossref_primary_10_1002_rnc_7109
crossref_primary_10_1016_j_automatica_2022_110366
crossref_primary_10_1016_j_jmsy_2020_06_018
crossref_primary_10_1109_TNNLS_2021_3057438
crossref_primary_10_1016_j_jfranklin_2023_06_015
crossref_primary_10_1109_JIOT_2023_3288050
crossref_primary_10_1109_TNNLS_2018_2869896
crossref_primary_10_1109_TCYB_2021_3107801
crossref_primary_10_3390_robotics7040066
crossref_primary_10_1016_j_neucom_2024_128355
crossref_primary_10_1016_j_isatra_2022_12_011
crossref_primary_10_1109_COMST_2023_3323344
crossref_primary_10_1002_acs_3729
crossref_primary_10_1016_j_asoc_2020_106099
crossref_primary_10_1109_JAS_2022_105797
crossref_primary_10_1016_j_ejcon_2024_101043
crossref_primary_10_1109_TCSII_2023_3279309
crossref_primary_10_1109_TTE_2024_3400534
crossref_primary_10_1016_j_ast_2021_107279
crossref_primary_10_1109_TNNLS_2020_3021530
crossref_primary_10_1016_j_apenergy_2019_114193
crossref_primary_10_1109_TAC_2022_3172250
crossref_primary_10_1109_JIOT_2020_2996213
crossref_primary_10_1109_TSMC_2021_3089944
crossref_primary_10_1016_j_ins_2022_08_041
crossref_primary_10_1002_aisy_202200371
crossref_primary_10_1016_j_amc_2024_129068
crossref_primary_10_1016_j_neunet_2018_05_005
crossref_primary_10_1109_TSMC_2023_3312268
crossref_primary_10_1016_j_ins_2023_02_079
crossref_primary_10_1109_TCSI_2022_3151464
crossref_primary_10_1109_OJCSYS_2022_3209945
crossref_primary_10_1007_s11042_024_18732_9
crossref_primary_10_1109_TNNLS_2020_2978805
crossref_primary_10_1109_JAS_2023_123009
crossref_primary_10_1002_rnc_6169
crossref_primary_10_1016_j_neucom_2019_11_057
crossref_primary_10_1007_s10462_021_10118_9
crossref_primary_10_3390_machines10100856
crossref_primary_10_1109_LRA_2019_2930475
Cites_doi 10.1007/s11768-011-0178-0
10.1002/oca.2222
10.1145/500742.500765
10.1002/9780470182963
10.1016/j.engappai.2016.02.007
10.1155/2014/628798
10.1137/S0363012998332433
10.1016/j.ijepes.2014.06.057
10.1109/TAC.1981.1102603
10.1109/ACC.1994.735224
10.1109/CYBER.2012.6392582
10.1016/j.automatica.2014.05.011
10.1109/CDC.2016.7798300
10.1109/ACC.2016.7525383
10.1145/1121241.1121263
10.1126/science.aaa8415
10.1109/MCAS.2009.933854
10.1109/TNNLS.2013.2288067
10.1002/rnc.2814
10.1016/j.automatica.2015.10.039
10.1007/978-3-319-50815-3
10.1142/S2301385016400069
10.1016/j.automatica.2014.02.015
10.1109/TCYB.2014.2384016
10.1109/TNNLS.2016.2635586
10.1109/TEC.2016.2543229
10.1137/1.9780898719376
10.1109/TMECH.2012.2219880
10.1137/1.9780898718652
10.2307/j.ctvcm4g0s
10.1109/TSMCB.2006.883869
10.1109/ICIA.2006.305870
10.1007/BF00115009
10.1007/BF00992698
10.1016/j.neucom.2014.08.030
10.1109/TNNLS.2016.2586303
10.1109/TSMC.1983.6313077
10.1109/JAS.2014.7004686
10.1007/978-1-4471-4757-2
10.1137/0305004
10.1016/j.automatica.2012.09.019
10.1109/TIE.2016.2630658
10.1007/978-1-4471-5574-4
10.1109/TSG.2016.2640184
10.1002/acs.2297
10.1016/j.automatica.2016.12.009
10.1109/TAC.2014.2317301
10.1109/ICARM.2016.7606926
10.1109/TNNLS.2014.2358227
10.1049/iet-cta.2015.0943
10.1109/TNNLS.2015.2441749
10.1016/j.jfranklin.2014.11.008
10.1016/j.automatica.2004.11.034
10.1177/0278364910371999
10.1016/j.robot.2004.03.004
10.1109/CDC.1989.70114
10.1109/ICMLC.2004.1380601
10.1109/CDC.2016.7799165
10.1016/j.automatica.2010.02.018
10.1109/JAS.2014.7004681
10.2514/1.G001154
10.1109/TSMCA.2002.804820
10.1109/CDC.2016.7799164
10.1162/089976600300015961
10.1016/j.automatica.2013.09.043
10.1016/j.automatica.2015.08.017
10.1109/TNNLS.2013.2294968
10.1016/j.automatica.2012.05.074
10.1109/TRO.2012.2210294
10.1002/0471459100
10.1016/j.pnsc.2008.03.006
10.1109/TCYB.2014.2319577
10.1109/TSMCB.2008.926614
10.1109/TASE.2014.2300532
10.1016/j.automatica.2015.06.001
10.1073/pnas.42.10.767
10.1109/TNNLS.2015.2388672
10.1016/j.automatica.2014.08.023
10.1016/j.automatica.2012.06.096
10.1016/j.automatica.2008.08.017
10.1109/TNNLS.2016.2541020
10.1109/9.29425
10.1109/TSMCB.2010.2043839
10.1007/978-0-387-69082-7
10.1016/j.jfranklin.2013.12.008
10.1109/9.256331
10.7763/IJET.2015.V7.835
10.1109/RiiSS.2013.6607932
10.1016/j.automatica.2006.09.019
10.1016/j.neunet.2009.03.008
10.1109/ACC.2010.5531586
10.1109/TNNLS.2014.2350835
10.1109/TNNLS.2015.2453320
10.1002/9781118122631
10.1016/j.ifacol.2016.07.127
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1109/TNNLS.2017.2773458
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Calcium & Calcified Tissue Abstracts
Ceramic Abstracts
Chemoreception Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Neurosciences Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Materials Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Materials Business File
Aerospace Database
Engineered Materials Abstracts
Biotechnology Research Abstracts
Chemoreception Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
Neurosciences Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Calcium & Calcified Tissue Abstracts
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

PubMed
Materials Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2162-2388
EndPage 2062
ExternalDocumentID 29771662
10_1109_TNNLS_2017_2773458
8169685
Genre orig-research
Research Support, U.S. Gov't, Non-P.H.S
Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: ONR
  grantid: N00014-17-1-2239
– fundername: U.S. NSF
  grantid: ECCS-1405173
– fundername: NATO through the Virginia Tech Startup Fund
  grantid: SPS G5176
– fundername: China NSFC
  grantid: 61633007
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
ACPRK
AENEX
AFRAH
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
M43
MS~
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
RIG
NPM
7QF
7QO
7QP
7QQ
7QR
7SC
7SE
7SP
7SR
7TA
7TB
7TK
7U5
8BQ
8FD
F28
FR3
H8D
JG9
JQ2
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c417t-9083b5d109310dc61fba723c8167da5469069a671b8e41f536eb1f4b632f11a03
IEDL.DBID RIE
ISSN 2162-237X
2162-2388
IngestDate Fri Jul 11 11:31:29 EDT 2025
Mon Jun 30 06:37:54 EDT 2025
Mon Jul 21 05:45:03 EDT 2025
Tue Jul 01 00:27:26 EDT 2025
Thu Apr 24 22:54:43 EDT 2025
Wed Aug 27 02:50:21 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 6
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c417t-9083b5d109310dc61fba723c8167da5469069a671b8e41f536eb1f4b632f11a03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-4074-1615
0000-0003-1978-4848
0000-0002-9701-8375
0000-0003-0800-5140
PMID 29771662
PQID 2174547689
PQPubID 85436
PageCount 21
ParticipantIDs proquest_miscellaneous_2041625740
pubmed_primary_29771662
proquest_journals_2174547689
crossref_primary_10_1109_TNNLS_2017_2773458
ieee_primary_8169685
crossref_citationtrail_10_1109_TNNLS_2017_2773458
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2018-06-01
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-06-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Piscataway
PublicationTitle IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev TNNLS
PublicationTitleAlternate IEEE Trans Neural Netw Learn Syst
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref57
lagoudakis (ref114) 2003; 4
ref56
ref59
ref58
aström (ref25) 2013
ref53
ref52
ref55
ref54
walters (ref97) 2013
ref51
ref50
robins (ref106) 2004
ref46
ref45
ref48
ref47
ref42
ref41
ref44
murphy (ref13) 2012
hastie (ref12) 2009
ref49
ref8
ref100
ref101
sutton (ref14) 2017; 1
ref40
ref35
ref34
stengel (ref2) 1986
ref36
bertsekas (ref15) 1996
sastry (ref24) 2011
ref33
ref39
mceneaney (ref7) 2006
ref38
werbos (ref30) 1991
vrabie (ref87) 2013
ref23
ref22
werbos (ref31) 1992
ref21
ref28
ref27
ref29
qiao (ref117) 2008
watkins (ref32) 1989
ioannou (ref26) 2013
bryson (ref6) 1975
ref96
ref124
ref99
ref125
ref98
ref10
ref17
ref16
ref19
howard (ref43) 1960
jain (ref104) 2013
bishop (ref11) 2006
hagan (ref9) 2014
ref93
ref92
ba?ar (ref63) 1995
ref95
ref94
ref91
ref90
ref89
ref86
ref85
ref88
modares (ref84) 2015
liberzon (ref3) 2011
heydari (ref37) 2014; 25
zhang (ref18) 2013
krsti? (ref20) 1995
ref82
levine (ref118) 2016; 17
ref81
ref83
ref80
ref79
ref108
ref78
ref109
kirk (ref5) 2012
ref107
ref75
ref105
ref77
ref102
ref76
ref103
vamvoudakis (ref74) 2014; 1
ref1
ref71
ref111
ref70
ref112
ref73
ref72
ref110
ref68
huang (ref115) 2005; 1
ref67
ref69
ref64
ref116
ref66
ref113
ref65
schulman (ref119) 2015
athans (ref4) 2006
ref60
ref122
ref123
ref62
ref120
ref61
ref121
References_xml – ident: ref47
  doi: 10.1007/s11768-011-0178-0
– ident: ref99
  doi: 10.1002/oca.2222
– ident: ref103
  doi: 10.1145/500742.500765
– ident: ref16
  doi: 10.1002/9780470182963
– ident: ref123
  doi: 10.1016/j.engappai.2016.02.007
– ident: ref95
  doi: 10.1155/2014/628798
– ident: ref8
  doi: 10.1137/S0363012998332433
– year: 1991
  ident: ref30
  publication-title: A Menu of Design for Reinforcement Learning Over Time
– ident: ref120
  doi: 10.1016/j.ijepes.2014.06.057
– ident: ref61
  doi: 10.1109/TAC.1981.1102603
– ident: ref53
  doi: 10.1109/ACC.1994.735224
– ident: ref94
  doi: 10.1109/CYBER.2012.6392582
– ident: ref83
  doi: 10.1016/j.automatica.2014.05.011
– ident: ref101
  doi: 10.1109/CDC.2016.7798300
– ident: ref78
  doi: 10.1109/ACC.2016.7525383
– year: 2006
  ident: ref4
  publication-title: Optimal Control An Introduction to the Theory and Its Applications
– ident: ref108
  doi: 10.1145/1121241.1121263
– ident: ref10
  doi: 10.1126/science.aaa8415
– ident: ref55
  doi: 10.1109/MCAS.2009.933854
– volume: 1
  year: 2017
  ident: ref14
  publication-title: Reinforcement Learning An Introduction
– volume: 25
  start-page: 1106
  year: 2014
  ident: ref37
  article-title: Optimal switching and control of nonlinear switching systems using approximate dynamic programming
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2013.2288067
– ident: ref90
  doi: 10.1002/rnc.2814
– ident: ref71
  doi: 10.1016/j.automatica.2015.10.039
– ident: ref19
  doi: 10.1007/978-3-319-50815-3
– ident: ref86
  doi: 10.1142/S2301385016400069
– ident: ref58
  doi: 10.1016/j.automatica.2014.02.015
– year: 2013
  ident: ref26
  publication-title: Robust Adaptive Control
– year: 1975
  ident: ref6
  publication-title: Applied Optimal Control Optimization Estimation and Control
– ident: ref60
  doi: 10.1109/TCYB.2014.2384016
– ident: ref122
  doi: 10.1109/TNNLS.2016.2635586
– ident: ref124
  doi: 10.1109/TEC.2016.2543229
– ident: ref23
  doi: 10.1137/1.9780898719376
– ident: ref111
  doi: 10.1109/TMECH.2012.2219880
– ident: ref22
  doi: 10.1137/1.9780898718652
– year: 2011
  ident: ref3
  publication-title: Calculus of Variations and Optimal Control Theory A Concise Introduction
  doi: 10.2307/j.ctvcm4g0s
– volume: 1
  start-page: 85
  year: 2005
  ident: ref115
  article-title: Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance
  publication-title: Proc Int Conf Mach Learn Cybern
– ident: ref46
  doi: 10.1109/TSMCB.2006.883869
– ident: ref116
  doi: 10.1109/ICIA.2006.305870
– ident: ref28
  doi: 10.1007/BF00115009
– ident: ref54
  doi: 10.1007/BF00992698
– ident: ref40
  doi: 10.1016/j.neucom.2014.08.030
– ident: ref75
  doi: 10.1109/TNNLS.2016.2586303
– ident: ref27
  doi: 10.1109/TSMC.1983.6313077
– volume: 1
  start-page: 282
  year: 2014
  ident: ref74
  article-title: Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems
  publication-title: IEEE/CAA Journal of Automatica Sinica
  doi: 10.1109/JAS.2014.7004686
– year: 2013
  ident: ref18
  publication-title: Adaptive Dynamic Programming for Control
  doi: 10.1007/978-1-4471-4757-2
– start-page: 575
  year: 2013
  ident: ref104
  article-title: Learning trajectory preferences for manipulators via iterative improvement
  publication-title: Proc Adv Neural Inf Process Syst
– year: 1996
  ident: ref15
  publication-title: Neuro-Dynamic Programming
– ident: ref42
  doi: 10.1137/0305004
– year: 2015
  ident: ref84
  article-title: Optimal tracking control of uncertain systems: On-policy and off-policy reinforcement learning approaches
– ident: ref68
  doi: 10.1016/j.automatica.2012.09.019
– ident: ref121
  doi: 10.1109/TIE.2016.2630658
– ident: ref44
  doi: 10.1007/978-1-4471-5574-4
– start-page: 784
  year: 2008
  ident: ref117
  article-title: Application of reinforcement learning based on neural network to dynamic obstacle avoidance
  publication-title: Proc Int Conf Inf Autom
– ident: ref125
  doi: 10.1109/TSG.2016.2640184
– ident: ref73
  doi: 10.1002/acs.2297
– ident: ref66
  doi: 10.1016/j.automatica.2016.12.009
– ident: ref82
  doi: 10.1109/TAC.2014.2317301
– year: 1995
  ident: ref20
  publication-title: Nonlinear and Adaptive Control Design
– ident: ref100
  doi: 10.1109/ICARM.2016.7606926
– ident: ref57
  doi: 10.1109/TNNLS.2014.2358227
– ident: ref76
  doi: 10.1049/iet-cta.2015.0943
– ident: ref88
  doi: 10.1109/TNNLS.2015.2441749
– ident: ref39
  doi: 10.1016/j.jfranklin.2014.11.008
– year: 1989
  ident: ref32
  article-title: Learning from delayed rewards
– ident: ref34
  doi: 10.1016/j.automatica.2004.11.034
– ident: ref109
  doi: 10.1177/0278364910371999
– ident: ref107
  doi: 10.1016/j.robot.2004.03.004
– ident: ref29
  doi: 10.1109/CDC.1989.70114
– year: 2006
  ident: ref7
  publication-title: Max-Plus Methods for Nonlinear Control and Estimation
– year: 2015
  ident: ref119
  publication-title: Trust region policy optimization
– ident: ref113
  doi: 10.1109/ICMLC.2004.1380601
– year: 1986
  ident: ref2
  publication-title: Optimal Control and Estimation
– ident: ref77
  doi: 10.1109/CDC.2016.7799165
– ident: ref36
  doi: 10.1016/j.automatica.2010.02.018
– year: 1960
  ident: ref43
  publication-title: Dynamic Programming and Markov Processes
– ident: ref70
  doi: 10.1109/JAS.2014.7004681
– ident: ref50
  doi: 10.2514/1.G001154
– ident: ref105
  doi: 10.1109/TSMCA.2002.804820
– ident: ref102
  doi: 10.1109/CDC.2016.7799164
– ident: ref33
  doi: 10.1162/089976600300015961
– year: 2013
  ident: ref87
  publication-title: Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles
– ident: ref72
  doi: 10.1016/j.automatica.2013.09.043
– ident: ref91
  doi: 10.1016/j.automatica.2015.08.017
– year: 2011
  ident: ref24
  publication-title: Adaptive Control Stability Convergence and Robustness
– start-page: 225
  year: 2004
  ident: ref106
  publication-title: Effects of Repeated Exposure to A Humanoid Robot on Children with Autism
– ident: ref79
  doi: 10.1109/TNNLS.2013.2294968
– ident: ref93
  doi: 10.1016/j.automatica.2012.05.074
– ident: ref110
  doi: 10.1109/TRO.2012.2210294
– volume: 17
  start-page: 1
  year: 2016
  ident: ref118
  article-title: End-to-end training of deep visuomotor policies
  publication-title: J Mach Learn Res
– ident: ref21
  doi: 10.1002/0471459100
– year: 2014
  ident: ref9
  publication-title: Neural Network Design
– ident: ref48
  doi: 10.1016/j.pnsc.2008.03.006
– ident: ref80
  doi: 10.1109/TCYB.2014.2319577
– year: 2013
  ident: ref25
  publication-title: Adaptive Control
– ident: ref45
  doi: 10.1109/TSMCB.2008.926614
– ident: ref89
  doi: 10.1109/TASE.2014.2300532
– ident: ref49
  doi: 10.1016/j.automatica.2015.06.001
– ident: ref59
  doi: 10.1073/pnas.42.10.767
– ident: ref41
  doi: 10.1109/TNNLS.2015.2388672
– year: 2006
  ident: ref11
  publication-title: Pattern Recognition and Machine Learning
– ident: ref85
  doi: 10.1016/j.automatica.2014.08.023
– year: 2009
  ident: ref12
  publication-title: The Elements of Statistical Learning Data Mining Inference and Prediction
– ident: ref81
  doi: 10.1016/j.automatica.2012.06.096
– ident: ref35
  doi: 10.1016/j.automatica.2008.08.017
– ident: ref51
  doi: 10.1109/TNNLS.2016.2541020
– year: 2012
  ident: ref5
  publication-title: Optimal Control Theory An Introduction
– ident: ref64
  doi: 10.1109/9.29425
– ident: ref56
  doi: 10.1109/TSMCB.2010.2043839
– ident: ref17
  doi: 10.1007/978-0-387-69082-7
– ident: ref38
  doi: 10.1016/j.jfranklin.2013.12.008
– ident: ref62
  doi: 10.1109/9.256331
– year: 1995
  ident: ref63
  publication-title: $H^\infty$ -Optimal Control and Related Minimax Design Problems
– ident: ref96
  doi: 10.7763/IJET.2015.V7.835
– year: 2012
  ident: ref13
  publication-title: Machine Learning A Probabilistic Perspective
– year: 1992
  ident: ref31
  article-title: Approximate dynamic programming for real-time control and neural modeling
  publication-title: Handbook of Intelligent Control Neural Fuzzy and Adaptive Approaches
– ident: ref98
  doi: 10.1109/RiiSS.2013.6607932
– volume: 4
  start-page: 1107
  year: 2003
  ident: ref114
  article-title: Least-squares policy iteration
  publication-title: J Mach Learn Res
– ident: ref65
  doi: 10.1016/j.automatica.2006.09.019
– ident: ref67
  doi: 10.1016/j.neunet.2009.03.008
– ident: ref69
  doi: 10.1109/ACC.2010.5531586
– ident: ref92
  doi: 10.1109/TNNLS.2014.2350835
– year: 2013
  ident: ref97
  publication-title: Online approximate optimal station keeping of an autonomous underwater vehicle
– ident: ref52
  doi: 10.1109/TNNLS.2015.2453320
– ident: ref1
  doi: 10.1002/9781118122631
– ident: ref112
  doi: 10.1016/j.ifacol.2016.07.127
SSID ssj0000605649
Score 2.6886592
Snippet This paper reviews the current state of the art on reinforcement learning (RL)-based feedback control solutions to optimal regulation and tracking of single...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 2042
SubjectTerms Algorithm design and analysis
Algorithms
Approximation algorithms
Autonomy
Computer & video games
data-based optimization
Feedback control
Games
H-infinity control
Heuristic algorithms
Learning
Learning (artificial intelligence)
Machine learning
Multiagent systems
Optimal control
Reinforcement
reinforcement learning (RL)
State-of-the-art reviews
System dynamics
Title Optimal and Autonomous Control Using Reinforcement Learning: A Survey
URI https://ieeexplore.ieee.org/document/8169685
https://www.ncbi.nlm.nih.gov/pubmed/29771662
https://www.proquest.com/docview/2174547689
https://www.proquest.com/docview/2041625740
Volume 29
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJy6lBdqmpZWRuJUs8SN20tsKgVAFW6mAtLcoccY9ANkKkkrl1zN2HqqqFvUWKY5jz3g839jzADhwKFyNngMZilhlro7JbJYxr5whi8vZNES9Xyz02bX6skyXa3A4xcIgYnA-w5l_DHf59cp2_qjsKOM-lUu6DuvUTR-rNZ2nJITLdUC7gmsRC2mWY4xMkh9dLRbnl96Ry8yEMVL5Cu-_6aFQWOXfGDPomtMtuBhH2buY3My6tprZxz8SOP7vNF7CiwF0snm_Sl7BGjbbsDUWdGCDfO_AyVfaQO6oZdnUbN61PuBh1T2w496fnQX_AvYNQ7ZVGw4W2ZCg9ftnNmeX3f1P_LUL16cnV8dn8VBnIbaKmzbOCYZVae0TS_Gktpq7qjRCWhqnqcvUG9A6L7XhVYaKu1Rq2uCdqrQUjvMyka9ho1k1-BaYI7yDUqKwTpPaS8qMVCRyRXJu6qwqI-Aj1Qs7JCH3tTBui2CMJHkROFV4ThUDpyL4NH3zo0_B8WzrHU_xqeVA7Aj2RuYWg5Q-FN4cSxUZXHkE-9Nrki9_aVI2SBSmvgmy0r6mkgje9Iti6lsQeOZai3d__-d72KSRZb1j2R5stPcdfiAI01Yfw9p9AlFj6fk
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwEB6VcoALBcojpYCR4ISyjR-xk0ocVqXVlm4XiW6lvaWJY3MAsqibFJXfwl_pf2PsPIQQcKvELVIcJ_GMx99nzwPgpTXMlsZJIDEsFIktQ6TNPKSFVci4rI591PvxTE5OxbtFvFiDH0MsjDHGO5-Zkbv0Z_nlUjduq2wnoS6VS-9CeWQuvyFBW705fIvSfMXYwf58bxJ2NQRCLaiqwxQhRhGXLmkSjUotqS1yxbjGrlSZx44cyjSXihaJEdTGXKLxsqKQnFlK84hjvzfgJuKMmLXRYcMOToRMQHp8zahkIeNq0UflROnOfDabnjjXMTViSnHhasr_svL5Ui5_R7V-dTvYgKt-XFqnlk-jpi5G-vtvKSP_14G7C3c6WE3G7Ty4B2umug8bfckK0lmwTdh_jybyC7bMq5KMm9qFdCybFdlrPfaJ96AgH4zPJ6v91inpUtB-3CVjctKcX5jLB3B6LT_zENarZWUeA7GI6AznhmkrcWGP8gRBgKECLZkqkyIPgPZSznSXZt1V-_iceboVpZnXjMxpRtZpRgCvh2e-tklG_tl600l4aNkJN4DtXpmyzg6tMkc4Y4GUMg3gxXAbLYg7FsorgyOMfSMoR8stogAetUo49M2QHlAp2daf3_kcbk3mx9Nsejg7egK38SuT1o1uG9br88Y8RcBWF8_8vCFwdt369hOvz0TA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Optimal+and+Autonomous+Control+Using+Reinforcement+Learning%3A+A+Survey&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Kiumarsi%2C+Bahare&rft.au=Vamvoudakis%2C+Kyriakos+G.&rft.au=Modares%2C+Hamidreza&rft.au=Lewis%2C+Frank+L.&rft.date=2018-06-01&rft.pub=IEEE&rft.issn=2162-237X&rft.volume=29&rft.issue=6&rft.spage=2042&rft.epage=2062&rft_id=info:doi/10.1109%2FTNNLS.2017.2773458&rft.externalDocID=8169685
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon