A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics,...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 42; no. 6; pp. 1291 - 1307
Main Authors Grondman, I., Busoniu, L., Lopes, G. A. D., Babuska, R.
Format Journal Article
LanguageEnglish
Published New-York, NY IEEE 01.11.2012
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
AbstractList Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to do policy search using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several standard and natural actor-critic algorithms follows and the paper concludes with an overview of application areas and a discussion on open issues.
Author Grondman, I.
Babuska, R.
Lopes, G. A. D.
Busoniu, L.
Author_xml – sequence: 1
  givenname: I.
  surname: Grondman
  fullname: Grondman, I.
  email: i.grondman@tudelft.nl
  organization: Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands
– sequence: 2
  givenname: L.
  surname: Busoniu
  fullname: Busoniu, L.
  email: lucian@busoniu.net
  organization: CRAN, Univ. de Lorraine, Vandoeuvre, France
– sequence: 3
  givenname: G. A. D.
  surname: Lopes
  fullname: Lopes, G. A. D.
  email: g.a.delgadolopesr@tudelft.nl
  organization: Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands
– sequence: 4
  givenname: R.
  surname: Babuska
  fullname: Babuska, R.
  email: r.buska@tudelft.nl
  organization: Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=26818899$$DView record in Pascal Francis
https://hal.science/hal-00756747$$DView record in HAL
BookMark eNp90UFvFCEUB3BiamJb_QJ64WJiD7MCAwx420xsa7Laxq1eyRsGFDMLFdgm--2ddTd78ODpvZDfn3f4X6CzmKJD6DUlC0qJfv-w_tz3C0YoWzBGldDiGTqnQqiGcc7O5p1o3kjddS_QRSm_CKGc6_YcfV_i9TY_uR1OHi9tTbnpc6jB4q8uRJ-ydRsXK145yDHEHx_wukIcIY94HvgL1G2GCd-nKdgdvskwhpmXl-i5h6m4V8d5ib5df3zob5vV3c2nfrlqrCCsNjB4zhR4OVo_UjuMfhy4Ip4LrRRppWsHwp2kbOgYVyC7USoCjBBBFIMB2kt0dfj3J0zmMYcN5J1JEMztcmX2b4R0Qna8e6KzfXewjzn93rpSzSYU66YJokvbYmhL95QxNtO3RwrFwuQzRBvK6QCTiiql9ezYwdmcSsnOnwglZl-M-VuM2RdjjsXMIfVPyIYKNaRYM4Tp_9E3h2hwzp1uyVYzLrr2DzpanBY
CODEN ITCRFH
CitedBy_id crossref_primary_10_1109_TAES_2024_3418944
crossref_primary_10_1371_journal_pone_0158722
crossref_primary_10_1007_s12652_023_04685_8
crossref_primary_10_1007_s11276_023_03582_4
crossref_primary_10_1109_ACCESS_2023_3312021
crossref_primary_10_1007_s10462_023_10620_2
crossref_primary_10_1016_j_engstruct_2022_114385
crossref_primary_10_1145_3569576
crossref_primary_10_1039_D0CP06184K
crossref_primary_10_1007_s11042_023_15932_7
crossref_primary_10_3390_math9060660
crossref_primary_10_1016_j_ifacol_2021_11_178
crossref_primary_10_1007_s00146_022_01569_x
crossref_primary_10_1109_TGCN_2018_2801725
crossref_primary_10_1049_joe_2019_1193
crossref_primary_10_3390_electronics11111754
crossref_primary_10_1109_JIOT_2024_3378217
crossref_primary_10_1007_s43762_024_00127_z
crossref_primary_10_1007_s10462_022_10299_x
crossref_primary_10_1109_TNSE_2018_2813333
crossref_primary_10_3390_app14167084
crossref_primary_10_1002_nag_3923
crossref_primary_10_1016_j_jobe_2024_110491
crossref_primary_10_1177_1729881420916258
crossref_primary_10_1007_s12046_023_02201_4
crossref_primary_10_1016_j_apenergy_2020_116355
crossref_primary_10_1016_j_ins_2023_01_019
crossref_primary_10_1109_TNSE_2021_3068340
crossref_primary_10_1109_ACCESS_2024_3479774
crossref_primary_10_1109_TNNLS_2017_2651104
crossref_primary_10_1109_TMI_2019_2946345
crossref_primary_10_1016_j_comnet_2020_107646
crossref_primary_10_1016_j_ifacol_2021_10_353
crossref_primary_10_1145_3338123
crossref_primary_10_1080_01691864_2023_2229886
crossref_primary_10_1016_j_future_2022_11_022
crossref_primary_10_1007_s10458_020_09447_w
crossref_primary_10_1016_j_jnca_2020_102781
crossref_primary_10_1016_j_vehcom_2023_100623
crossref_primary_10_3390_pr10112311
crossref_primary_10_1016_j_measen_2023_100939
crossref_primary_10_1109_MNET_011_2300032
crossref_primary_10_1016_j_cirpj_2023_05_008
crossref_primary_10_1016_j_knosys_2022_108304
crossref_primary_10_1109_TTE_2023_3322685
crossref_primary_10_3233_AIC_220316
crossref_primary_10_1109_TSMC_2024_3450601
crossref_primary_10_1145_3404191
crossref_primary_10_1016_j_trc_2024_104578
crossref_primary_10_1109_ACCESS_2024_3499378
crossref_primary_10_1109_TIFS_2020_3013200
crossref_primary_10_1016_j_neubiorev_2022_104977
crossref_primary_10_1109_JLT_2021_3123271
crossref_primary_10_1088_1742_6596_2767_3_032017
crossref_primary_10_1007_s10845_023_02179_0
crossref_primary_10_1007_s41066_016_0018_1
crossref_primary_10_1109_TCYB_2015_2481081
crossref_primary_10_1016_j_ins_2022_06_015
crossref_primary_10_1007_s10479_022_04572_z
crossref_primary_10_1109_ACCESS_2020_2967626
crossref_primary_10_2139_ssrn_3990594
crossref_primary_10_1016_j_ifacol_2024_08_363
crossref_primary_10_1109_TSG_2024_3399705
crossref_primary_10_1109_TCCN_2024_3443265
crossref_primary_10_3390_app10114011
crossref_primary_10_1007_s13235_022_00449_9
crossref_primary_10_1007_s41315_019_00100_8
crossref_primary_10_1021_acs_jpclett_3c02771
crossref_primary_10_1007_s11721_017_0142_9
crossref_primary_10_1109_JIOT_2024_3355023
crossref_primary_10_1109_JAS_2024_124227
crossref_primary_10_1016_j_comnet_2024_110706
crossref_primary_10_1016_j_cie_2023_109631
crossref_primary_10_1109_TCST_2024_3401863
crossref_primary_10_1016_j_engappai_2023_107300
crossref_primary_10_1016_j_trc_2023_104019
crossref_primary_10_1016_j_comcom_2021_07_014
crossref_primary_10_1080_00207543_2022_2104180
crossref_primary_10_4995_riai_2019_10379
crossref_primary_10_1007_s11571_024_10137_6
crossref_primary_10_3390_s25020388
crossref_primary_10_2514_1_J062801
crossref_primary_10_1016_j_neunet_2014_09_003
crossref_primary_10_1109_TCCN_2019_2941191
crossref_primary_10_1002_ett_4427
crossref_primary_10_1016_j_neunet_2018_07_018
crossref_primary_10_1109_TMC_2022_3188473
crossref_primary_10_1016_j_ins_2022_01_047
crossref_primary_10_1016_j_neucom_2018_05_061
crossref_primary_10_1016_j_rcim_2024_102857
crossref_primary_10_1109_JIOT_2022_3168869
crossref_primary_10_1109_TVT_2019_2922668
crossref_primary_10_1016_j_eswa_2019_112963
crossref_primary_10_1109_TVT_2022_3171817
crossref_primary_10_1088_1361_6633_aab406
crossref_primary_10_1109_TSE_2024_3397822
crossref_primary_10_1016_j_trc_2017_09_020
crossref_primary_10_1109_TII_2024_3424529
crossref_primary_10_1145_3699731
crossref_primary_10_1016_j_inffus_2018_11_020
crossref_primary_10_1016_j_engappai_2018_11_006
crossref_primary_10_1109_JIOT_2023_3324392
crossref_primary_10_1109_TVLSI_2023_3321532
crossref_primary_10_1007_s12243_024_01018_4
crossref_primary_10_1109_TVLSI_2023_3321536
crossref_primary_10_1007_s10614_020_10038_w
crossref_primary_10_1016_j_neucom_2021_10_004
crossref_primary_10_1080_14697688_2021_2001032
crossref_primary_10_3390_en17184623
crossref_primary_10_1109_LCSYS_2020_2979635
crossref_primary_10_1109_TASE_2023_3319510
crossref_primary_10_3233_JIFS_210032
crossref_primary_10_1016_j_entcom_2020_100357
crossref_primary_10_1109_JIOT_2020_3004394
crossref_primary_10_1145_3643862
crossref_primary_10_1016_j_comnet_2022_109032
crossref_primary_10_1109_TCYB_2019_2939174
crossref_primary_10_1016_j_advengsoft_2023_103487
crossref_primary_10_1109_JIOT_2022_3219202
crossref_primary_10_1109_TNSM_2023_3243837
crossref_primary_10_1016_j_mechatronics_2014_10_005
crossref_primary_10_1109_ACCESS_2024_3384460
crossref_primary_10_1109_TII_2022_3177415
crossref_primary_10_1016_j_neunet_2019_01_011
crossref_primary_10_1109_ACCESS_2019_2924030
crossref_primary_10_5050_KSNVE_2022_32_2_124
crossref_primary_10_1007_s12369_015_0314_y
crossref_primary_10_3390_fi14090256
crossref_primary_10_1109_JIOT_2024_3434641
crossref_primary_10_1007_s42401_022_00169_3
crossref_primary_10_1016_j_actaastro_2021_07_010
crossref_primary_10_1109_TCSS_2021_3100291
crossref_primary_10_1007_s10458_020_09455_w
crossref_primary_10_1109_TII_2015_2404299
crossref_primary_10_1109_ACCESS_2019_2914469
crossref_primary_10_1016_j_arcontrol_2018_09_005
crossref_primary_10_1049_rpg2_12782
crossref_primary_10_3390_info15050272
crossref_primary_10_1109_LWC_2020_3030695
crossref_primary_10_3390_computers11070104
crossref_primary_10_1103_PhysRevFluids_7_023103
crossref_primary_10_46740_alku_1390397
crossref_primary_10_1016_j_ins_2024_120182
crossref_primary_10_1007_s11831_024_10196_2
crossref_primary_10_1016_j_ast_2020_106013
crossref_primary_10_1016_j_comnet_2019_05_013
crossref_primary_10_1016_j_trc_2020_102662
crossref_primary_10_1017_S0269888921000023
crossref_primary_10_1007_s10479_024_06284_y
crossref_primary_10_1080_00207217_2019_1600740
crossref_primary_10_3390_math10142523
crossref_primary_10_1016_j_neucom_2022_03_036
crossref_primary_10_1007_s10696_024_09579_1
crossref_primary_10_3390_systems10050180
crossref_primary_10_1016_j_arcontrol_2021_04_001
crossref_primary_10_1088_1755_1315_1101_9_092027
crossref_primary_10_1016_j_trip_2021_100425
crossref_primary_10_1007_s10846_021_01389_z
crossref_primary_10_1109_TWC_2014_022014_130840
crossref_primary_10_3390_ai1020019
crossref_primary_10_1007_s12652_018_0819_y
crossref_primary_10_3390_electronics13030555
crossref_primary_10_1016_j_ress_2021_107530
crossref_primary_10_3390_s20226595
crossref_primary_10_1016_j_optlaseng_2023_107923
crossref_primary_10_1088_1361_6560_ab843e
crossref_primary_10_3390_sym16081030
crossref_primary_10_1016_j_eswa_2023_121421
crossref_primary_10_1016_j_ress_2023_109231
crossref_primary_10_1017_bca_2019_17
crossref_primary_10_1109_JSYST_2019_2958890
crossref_primary_10_1049_joe_2019_1200
crossref_primary_10_1007_s12530_024_09587_4
crossref_primary_10_1109_TCYB_2015_2483780
crossref_primary_10_3390_math11061419
crossref_primary_10_1109_JSAIT_2023_3287929
crossref_primary_10_1007_s00500_018_3063_7
crossref_primary_10_1002_cjce_24508
crossref_primary_10_1007_s10994_015_5484_1
crossref_primary_10_3389_fmars_2025_1534622
crossref_primary_10_1109_JIOT_2024_3498322
crossref_primary_10_1109_TGCN_2021_3049500
crossref_primary_10_1155_2022_9017079
crossref_primary_10_1016_j_trc_2018_10_024
crossref_primary_10_1021_acs_jpca_0c04473
crossref_primary_10_1109_ACCESS_2020_3027152
crossref_primary_10_1016_j_ifacol_2018_11_115
crossref_primary_10_1016_j_engappai_2021_104398
crossref_primary_10_3390_s19071547
crossref_primary_10_1016_j_energy_2024_133912
crossref_primary_10_1109_TAC_2015_2458491
crossref_primary_10_1145_3477600
crossref_primary_10_1021_acs_jcim_2c00373
crossref_primary_10_3390_electronics9122200
crossref_primary_10_1109_JIOT_2020_2988033
crossref_primary_10_1016_j_matcom_2022_02_016
crossref_primary_10_1109_TBIOM_2022_3188825
crossref_primary_10_1016_j_isatra_2024_12_045
crossref_primary_10_1016_j_artmed_2023_102736
crossref_primary_10_1016_j_adhoc_2023_103193
crossref_primary_10_1016_j_enbuild_2024_114189
crossref_primary_10_1109_LRA_2020_2972862
crossref_primary_10_1016_j_neucom_2024_127954
crossref_primary_10_1109_JIOT_2024_3361772
crossref_primary_10_1109_TPDS_2019_2893648
crossref_primary_10_1109_TIV_2019_2904417
crossref_primary_10_3390_electronics12204324
crossref_primary_10_1016_j_enbuild_2024_114065
crossref_primary_10_1109_ACCESS_2020_3006254
crossref_primary_10_1109_TAC_2020_2995814
crossref_primary_10_7554_eLife_21492
crossref_primary_10_7554_eLife_56212
crossref_primary_10_1109_TMC_2023_3270242
crossref_primary_10_1007_s13198_021_01147_2
crossref_primary_10_1016_j_segan_2023_101109
crossref_primary_10_1016_j_engappai_2022_105329
crossref_primary_10_1016_j_neucom_2024_127716
crossref_primary_10_1016_j_artint_2017_12_001
crossref_primary_10_3390_e24101357
crossref_primary_10_1007_s00521_023_08760_1
crossref_primary_10_1007_s12559_017_9511_3
crossref_primary_10_1109_TNNLS_2015_2442233
crossref_primary_10_1016_j_ensm_2024_103612
crossref_primary_10_1109_TNNLS_2016_2558295
crossref_primary_10_1111_mice_12995
crossref_primary_10_1016_j_neucom_2021_01_096
crossref_primary_10_1109_ACCESS_2024_3358203
crossref_primary_10_1109_TNNLS_2020_2977924
crossref_primary_10_1109_ACCESS_2020_2970433
crossref_primary_10_1109_TSMC_2024_3488205
crossref_primary_10_1109_TNNLS_2017_2702566
crossref_primary_10_1016_j_energy_2024_132344
crossref_primary_10_1016_j_ijepes_2020_106145
crossref_primary_10_1016_j_procs_2025_01_206
crossref_primary_10_1109_TETC_2021_3115793
crossref_primary_10_1109_TSG_2024_3404859
crossref_primary_10_1007_s10458_021_09514_w
crossref_primary_10_1016_j_measurement_2023_113786
crossref_primary_10_1109_TII_2024_3459025
crossref_primary_10_1007_s10845_025_02580_x
crossref_primary_10_1007_s10845_023_02085_5
crossref_primary_10_1016_j_trc_2020_102738
crossref_primary_10_1109_JIOT_2020_3003449
crossref_primary_10_2514_1_I011150
crossref_primary_10_1109_JSEN_2024_3369038
crossref_primary_10_1109_TVT_2018_2820002
crossref_primary_10_1155_2023_3837615
crossref_primary_10_1111_mafi_12388
crossref_primary_10_1109_COMST_2016_2539923
crossref_primary_10_1016_j_ifacol_2020_12_1918
crossref_primary_10_1016_j_neucom_2025_130044
crossref_primary_10_1039_D2RA06022A
crossref_primary_10_3390_app112311335
crossref_primary_10_1016_j_future_2023_09_018
crossref_primary_10_1016_j_trc_2020_102829
crossref_primary_10_1016_j_eswa_2021_115127
crossref_primary_10_1109_TIM_2023_3289531
crossref_primary_10_1007_s00245_024_10138_1
crossref_primary_10_1109_TVT_2020_3018817
crossref_primary_10_1016_j_trc_2020_102949
crossref_primary_10_1017_S089006042100007X
crossref_primary_10_1016_j_jnca_2017_01_026
crossref_primary_10_1049_itr2_12120
crossref_primary_10_1016_j_mechmachtheory_2024_105676
crossref_primary_10_1016_j_neucom_2023_02_008
crossref_primary_10_1109_ACCESS_2018_2878853
crossref_primary_10_1109_TPAMI_2019_2956699
crossref_primary_10_1007_s12145_021_00572_y
crossref_primary_10_1145_3539223
crossref_primary_10_1007_s10462_023_10450_2
crossref_primary_10_1016_j_knosys_2023_110485
crossref_primary_10_1109_ACCESS_2020_2978254
crossref_primary_10_3390_drones8070338
crossref_primary_10_1017_S0269888923000012
crossref_primary_10_1109_TMC_2021_3107458
crossref_primary_10_3182_20140824_6_ZA_1003_01705
crossref_primary_10_1109_TWC_2022_3216342
crossref_primary_10_1002_cpe_7351
crossref_primary_10_1016_j_compeleceng_2022_107989
crossref_primary_10_1007_s12555_023_0342_6
crossref_primary_10_1016_j_aei_2018_08_002
crossref_primary_10_1016_j_renene_2021_11_052
crossref_primary_10_1002_ett_4929
crossref_primary_10_1016_j_multra_2024_100164
crossref_primary_10_1007_s10462_021_09997_9
crossref_primary_10_1109_TITS_2019_2962338
crossref_primary_10_1016_j_compag_2024_108613
crossref_primary_10_1016_j_ress_2024_110118
crossref_primary_10_1109_ACCESS_2025_3537859
crossref_primary_10_1109_TWC_2021_3120266
crossref_primary_10_1109_TWC_2023_3309688
crossref_primary_10_1109_JIOT_2019_2951509
crossref_primary_10_3233_ICA_230707
crossref_primary_10_1016_j_jnca_2023_103604
crossref_primary_10_1021_jacs_2c13467
crossref_primary_10_1371_journal_pcbi_1007452
crossref_primary_10_3233_JIFS_17052
crossref_primary_10_1007_s10489_022_04227_3
crossref_primary_10_1177_01423312241254590
crossref_primary_10_1109_TNNLS_2021_3116063
crossref_primary_10_1016_j_enbuild_2022_112584
crossref_primary_10_1016_j_pecs_2021_100967
crossref_primary_10_1016_j_jhydrol_2020_124923
crossref_primary_10_1109_TMC_2023_3260086
crossref_primary_10_1021_ie4031743
crossref_primary_10_3390_en17030616
crossref_primary_10_1109_THMS_2019_2912447
crossref_primary_10_1007_s00500_018_3225_7
crossref_primary_10_1016_j_mechatronics_2014_08_001
crossref_primary_10_1080_00031305_2022_2129787
crossref_primary_10_1007_s11063_024_11640_x
crossref_primary_10_1109_TCYB_2019_2927410
crossref_primary_10_1111_mice_12956
crossref_primary_10_3390_app10124236
crossref_primary_10_1287_ijoc_2016_0733
crossref_primary_10_1016_j_apenergy_2023_121947
crossref_primary_10_1109_JIOT_2022_3161680
crossref_primary_10_1109_TNNLS_2022_3191021
crossref_primary_10_1109_TNSE_2022_3224028
crossref_primary_10_1109_ACCESS_2020_3037940
crossref_primary_10_1016_j_physd_2020_132620
crossref_primary_10_1016_j_trc_2020_102626
crossref_primary_10_1016_j_infsof_2023_107325
crossref_primary_10_34133_research_0064
crossref_primary_10_1016_j_compeleceng_2021_107117
crossref_primary_10_3390_make5040082
crossref_primary_10_1016_j_ifacol_2023_10_198
crossref_primary_10_1109_TMC_2020_3040945
crossref_primary_10_1109_TAC_2023_3264176
crossref_primary_10_1002_aic_18611
crossref_primary_10_1007_s11063_020_10241_8
crossref_primary_10_1080_00207179_2021_1913516
crossref_primary_10_1016_j_cose_2023_103278
crossref_primary_10_1016_j_knosys_2021_106781
crossref_primary_10_1061_JWRMD5_WRENG_6089
crossref_primary_10_1108_CI_10_2022_0278
crossref_primary_10_1109_TWC_2017_2769644
crossref_primary_10_1109_ACCESS_2019_2930803
crossref_primary_10_3390_e25071007
crossref_primary_10_3390_en14206743
crossref_primary_10_1049_itr2_12386
crossref_primary_10_1016_j_addma_2024_104121
crossref_primary_10_1016_j_chaos_2024_114712
crossref_primary_10_1016_j_ifacol_2023_02_028
crossref_primary_10_1016_j_jksuci_2022_05_001
crossref_primary_10_2478_jaiscr_2023_0019
crossref_primary_10_1109_TWC_2024_3367034
crossref_primary_10_3390_electronics12234845
crossref_primary_10_1109_TNSM_2020_3047765
crossref_primary_10_1109_ACCESS_2021_3068129
crossref_primary_10_1002_amp2_10079
crossref_primary_10_1017_hpl_2023_47
crossref_primary_10_1109_TIA_2020_2986186
crossref_primary_10_1016_j_eswa_2023_119910
crossref_primary_10_1016_j_ifacol_2021_08_168
crossref_primary_10_1109_ACCESS_2022_3227450
crossref_primary_10_1109_TVCG_2021_3076749
crossref_primary_10_1007_s42064_024_0212_x
crossref_primary_10_1109_TVT_2024_3443263
crossref_primary_10_3390_s22010270
crossref_primary_10_1109_TMC_2024_3421541
crossref_primary_10_1016_j_asr_2022_08_002
crossref_primary_10_1109_ACCESS_2024_3365270
crossref_primary_10_1038_s41586_022_04894_9
crossref_primary_10_1109_TVT_2021_3103416
crossref_primary_10_1109_ACCESS_2025_3526871
crossref_primary_10_1109_TSG_2020_3037066
crossref_primary_10_1109_TGCN_2021_3132561
crossref_primary_10_1016_j_jmsy_2024_10_026
crossref_primary_10_1016_j_comnet_2023_110016
crossref_primary_10_48295_ET_2022_86_1
crossref_primary_10_1007_s11276_019_02117_0
crossref_primary_10_1016_j_comnet_2024_110878
crossref_primary_10_1016_j_procs_2023_12_009
crossref_primary_10_1021_acs_jctc_3c00528
crossref_primary_10_1109_TNSE_2020_2978856
crossref_primary_10_1007_s11036_024_02311_1
crossref_primary_10_1016_j_eng_2021_04_027
crossref_primary_10_1109_JIOT_2021_3084923
crossref_primary_10_1007_s13042_023_01981_9
crossref_primary_10_1016_j_jpdc_2022_03_016
crossref_primary_10_1109_TNNLS_2021_3056418
crossref_primary_10_1016_j_asoc_2023_111153
crossref_primary_10_1016_j_isatra_2021_06_010
crossref_primary_10_1016_j_engappai_2024_108695
crossref_primary_10_1109_COMST_2019_2924243
crossref_primary_10_1016_j_patcog_2022_108875
crossref_primary_10_3233_ICA_160531
crossref_primary_10_1016_j_iot_2022_100597
crossref_primary_10_5604_01_3001_0054_6282
crossref_primary_10_1016_j_eswa_2024_123592
crossref_primary_10_1109_TSMC_2020_3042876
crossref_primary_10_1002_acs_3958
crossref_primary_10_1088_2632_2153_ad5f73
crossref_primary_10_1016_j_aei_2021_101315
crossref_primary_10_1109_TCYB_2016_2623859
crossref_primary_10_54097_hset_v39i_6725
crossref_primary_10_1016_j_isatra_2020_02_017
crossref_primary_10_1155_2020_8854837
crossref_primary_10_2139_ssrn_4599800
crossref_primary_10_1016_j_cie_2024_110633
crossref_primary_10_3390_rs15071842
crossref_primary_10_1002_aisy_202200462
crossref_primary_10_1016_j_engappai_2024_108506
crossref_primary_10_1785_0220230118
crossref_primary_10_1109_TNSE_2025_3528190
crossref_primary_10_35784_jcsi_3579
crossref_primary_10_1016_j_iswa_2025_200485
crossref_primary_10_1016_j_ress_2025_111018
crossref_primary_10_4236_jqis_2019_91001
crossref_primary_10_1109_ACCESS_2020_2979323
crossref_primary_10_1109_TGCN_2022_3190007
crossref_primary_10_23919_ICN_2024_0007
crossref_primary_10_1016_j_compmedimag_2023_102201
crossref_primary_10_1016_j_procs_2021_10_023
crossref_primary_10_1016_j_ifacol_2018_07_308
crossref_primary_10_1007_s12541_020_00315_x
crossref_primary_10_1007_s10586_022_03957_w
crossref_primary_10_3390_su15097087
crossref_primary_10_1016_j_neucom_2019_11_032
crossref_primary_10_1109_TCDS_2023_3296166
crossref_primary_10_1080_08839514_2024_2383101
crossref_primary_10_1016_j_asoc_2020_106099
crossref_primary_10_1049_ipr2_12930
crossref_primary_10_1109_TCYB_2014_2343194
crossref_primary_10_1109_TWC_2021_3073623
crossref_primary_10_1016_j_array_2024_100365
crossref_primary_10_1109_TKDE_2023_3341430
crossref_primary_10_3390_technologies12120259
crossref_primary_10_1016_j_robot_2024_104914
crossref_primary_10_1109_JIOT_2022_3174469
crossref_primary_10_1109_TNNLS_2021_3090570
crossref_primary_10_1109_TSMC_2015_2417510
crossref_primary_10_1109_TWC_2024_3452689
crossref_primary_10_3390_app14072916
crossref_primary_10_1109_TVCG_2020_3030423
crossref_primary_10_1016_j_scs_2019_101748
crossref_primary_10_1016_j_arcontrol_2022_09_003
crossref_primary_10_1680_jtran_17_00085
crossref_primary_10_2139_ssrn_3748130
crossref_primary_10_1145_3712292
crossref_primary_10_1016_j_comnet_2021_107875
crossref_primary_10_1109_TCCN_2021_3130995
crossref_primary_10_1016_j_aei_2022_101818
crossref_primary_10_1109_COMST_2024_3395414
crossref_primary_10_1016_j_ins_2023_02_079
crossref_primary_10_1016_j_neucom_2016_08_155
crossref_primary_10_3390_math11020437
crossref_primary_10_1016_j_jairtraman_2023_102397
crossref_primary_10_1109_TEM_2022_3166769
crossref_primary_10_1016_j_isatra_2023_08_005
crossref_primary_10_1109_TCCN_2023_3306363
crossref_primary_10_1016_j_ins_2023_01_042
crossref_primary_10_3390_drones7080513
crossref_primary_10_1109_JAS_2023_123009
crossref_primary_10_1016_j_neucom_2024_128255
crossref_primary_10_1016_j_engappai_2019_04_008
Cites_doi 10.2200/S00268ED1V01Y201005AIM009
10.1007/BF00114724
10.1613/jair.301
10.1147/rd.33.0210
10.1007/BF00114723
10.1109/ICASSP.1998.675489
10.1007/978-3-540-87481-2_6
10.1109/CDC.2001.980135
10.1016/S0005-1098(99)00099-0
10.1007/978-3-540-87481-2_5
10.1162/089976698300017746
10.1109/ICMLC.2004.1378544
10.1109/COEC.2003.1210269
10.1109/ACC.2006.1656451
10.1109/TSMCB.2009.2026289
10.1145/1390156.1390214
10.1109/TSMC.1983.6313077
10.1016/j.automatica.2009.07.008
10.1109/TSMCB.2006.886173
10.1007/978-3-540-89722-4_9
10.1109/9.580874
10.1287/ijoc.1080.0305
10.1109/21.52551
10.1016/S0167-6911(01)00152-9
10.1007/s10514-009-9132-0
10.1109/SICE.2008.4654995
10.1016/j.neunet.2008.02.003
10.1162/neco.2009.12-08-922
10.1109/9.119632
10.1109/TAC.2009.2037462
10.1016/j.neunet.2007.01.002
10.1109/ACC.1994.735224
10.1109/TNN.2006.889499
10.1137/S036301299731669X
10.1007/BF00992696
10.1016/j.sysconle.2010.08.013
10.1007/BF00115009
10.1007/11596448_9
10.1109/ROBOT.2010.5509751
10.1137/S0363012901385691
10.1145/1390156.1390240
10.1016/j.neucom.2007.11.026
10.1016/j.ins.2007.03.012
10.1016/B978-1-55860-377-6.50040-2
10.1016/S0019-9958(77)90354-0
10.1109/IJCNN.1992.287219
10.1016/S0921-8890(97)00043-2
10.1007/BF00992698
10.1109/CDC.2009.5400592
10.1613/jair.806
10.1016/0893-6080(90)90056-Q
10.1109/TSMCB.2005.846001
10.1023/A:1017936530646
10.1109/TFUZZ.2003.814834
10.1109/ADPRL.2007.368196
10.1016/j.automatica.2010.02.018
10.1016/B978-1-55860-377-6.50013-X
10.1007/s10994-010-5223-6
ContentType Journal Article
Copyright 2014 INIST-CNRS
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2014 INIST-CNRS
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
IQODW
7SC
7SP
7TB
8FD
F28
FR3
JQ2
L7M
L~C
L~D
1XC
VOOES
DOI 10.1109/TSMCC.2012.2218595
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Pascal-Francis
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Sciences (General)
Applied Sciences
Computer Science
EISSN 1558-2442
EndPage 1307
ExternalDocumentID oai_HAL_hal_00756747v1
26818899
10_1109_TSMCC_2012_2218595
6392457
Genre orig-research
GroupedDBID -~X
0R~
29I
4.4
5VS
6IK
97E
AAJGR
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
ALLEH
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
F5P
HZ~
H~9
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
PZZ
RIA
RIE
RNS
VH1
AAYOK
AAYXX
CITATION
RIG
ATWAV
IPNFZ
IQODW
XFK
7SC
7SP
7TB
8FD
F28
FR3
JQ2
L7M
L~C
L~D
1XC
VOOES
ID FETCH-LOGICAL-c502t-abf428af6dcfd1cbdfdb480f45988036e3b04e612b7248a67d680a2005082aba3
IEDL.DBID RIE
ISSN 1094-6977
IngestDate Fri May 09 12:24:27 EDT 2025
Sun Aug 24 04:04:32 EDT 2025
Tue Sep 20 22:33:43 EDT 2022
Tue Jul 01 03:52:42 EDT 2025
Thu Apr 24 22:51:13 EDT 2025
Tue Aug 26 17:18:15 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords Concept learning
Action
Gradient
Finance
Reinforcement learning
Optimal policy
policy gradient
Algorithmics
Actor-critic
State space
State space method
Function approximation
Robotics
Variance
reinforcement learning (RL)
Search algorithm
Gradient descent
natural gradient
Biomimetics
Power control
Learning algorithm
Artificial intelligence
actor-critic
reinforcement learning
Language English
License CC BY 4.0
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c502t-abf428af6dcfd1cbdfdb480f45988036e3b04e612b7248a67d680a2005082aba3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
OpenAccessLink https://hal.science/hal-00756747
PQID 1315674222
PQPubID 23500
PageCount 17
ParticipantIDs proquest_miscellaneous_1315674222
crossref_primary_10_1109_TSMCC_2012_2218595
ieee_primary_6392457
hal_primary_oai_HAL_hal_00756747v1
pascalfrancis_primary_26818899
crossref_citationtrail_10_1109_TSMCC_2012_2218595
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2012-11-01
PublicationDateYYYYMMDD 2012-11-01
PublicationDate_xml – month: 11
  year: 2012
  text: 2012-11-01
  day: 01
PublicationDecade 2010
PublicationPlace New-York, NY
PublicationPlace_xml – name: New-York, NY
PublicationTitle IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews
PublicationTitleAbbrev TSMCC
PublicationYear 2012
Publisher IEEE
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers
References ref13
ref56
ref12
li (ref14) 2008
ref59
bhatnagar (ref77) 2008
ref58
glynn (ref37) 1987
ref11
ref54
ref10
peters (ref44) 2010
schoknecht (ref30) 2003
li (ref57) 2009
ref17
ref16
ref50
kaelbling (ref8) 1996; 4
ref46
ref45
ref47
hasdorff (ref41) 1976
ref85
ref43
ref49
niedzwiedz (ref55) 2008
ref9
ref3
gullapalli (ref81) 1993
ref6
baird (ref48) 1999
ref5
ref82
ref84
ref83
ref80
ref79
ref35
ref78
kakade (ref51) 2001
ref75
ref33
ref32
lazaric (ref31) 2010
ref1
cheng (ref53) 2004
ref38
peters (ref52) 0
dyer (ref40) 1970
ref71
ref70
melo (ref76) 2008
ref73
bu?oniu (ref15) 2010
ref72
morimura (ref68) 2009
rummery (ref24) 1994
sutton (ref7) 1998
ref67
ref23
ref26
ref69
ref25
jacobson (ref39) 1970
ref64
ref20
aleksandrov (ref36) 1968; 5
ref66
ref22
ref65
watkins (ref21) 1989
ref28
ref27
bagnell (ref74) 2003
bagnell (ref34) 2003
ref29
bertsekas (ref18) 2007
ref60
ref62
sutton (ref19) 2000
ref61
richter (ref2) 2007
deisenroth (ref42) 2011
baxter (ref4) 2001; 15
kim (ref63) 2010; 40
References_xml – ident: ref10
  doi: 10.2200/S00268ED1V01Y201005AIM009
– start-page: 1057
  year: 2000
  ident: ref19
  article-title: Policy gradient methods for reinforcement learning with function approximation
  publication-title: Advances in Neural Information Processing Systems 12
– ident: ref27
  doi: 10.1007/BF00114724
– volume: 4
  start-page: 237
  year: 1996
  ident: ref8
  article-title: Reinforcement learning: A survey
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.301
– ident: ref69
  doi: 10.1147/rd.33.0210
– start-page: 1019
  year: 2003
  ident: ref74
  article-title: Covariant policy search
  publication-title: Proc 18th Int Joint Conf Artif Intell
– ident: ref47
  doi: 10.1007/BF00114723
– ident: ref73
  doi: 10.1109/ICASSP.1998.675489
– ident: ref78
  doi: 10.1007/978-3-540-87481-2_6
– ident: ref58
  doi: 10.1109/CDC.2001.980135
– ident: ref49
  doi: 10.1016/S0005-1098(99)00099-0
– start-page: 66
  year: 2008
  ident: ref76
  article-title: Fitted natural actor-critic: A new algorithm for continuous state-action MDPs
  publication-title: Proc Eur Conf Mach Learn Knowl Discovery Databases
  doi: 10.1007/978-3-540-87481-2_5
– ident: ref75
  doi: 10.1162/089976698300017746
– start-page: 368
  year: 2009
  ident: ref57
  article-title: Urban traffic signal learning control using fuzzy actor-critic methods
  publication-title: Proc 5th Int Conf Natural Comput
– start-page: 2985
  year: 2004
  ident: ref53
  article-title: Application of actor-critic learning to adaptive state space construction
  publication-title: Proc 3rd Int Conf Mach Learn Cybern
  doi: 10.1109/ICMLC.2004.1378544
– start-page: 1555
  year: 2003
  ident: ref30
  article-title: Optimality of reinforcement learning algorithms with linear function approximation
  publication-title: Advances in Neural Information Processing Systems 15
– start-page: 366
  year: 1987
  ident: ref37
  article-title: Likelihood ratio gradient estimation: An overview
  publication-title: Proc Winter Simul Conf
– year: 2003
  ident: ref34
  article-title: Policy search in kernel Hilbert space
– ident: ref59
  doi: 10.1109/COEC.2003.1210269
– year: 1999
  ident: ref48
  article-title: Gradient descent for general reinforcement learning
  publication-title: Advances in Neural Information Processing Systems 11
– ident: ref72
  doi: 10.1109/ACC.2006.1656451
– year: 1998
  ident: ref7
  publication-title: Reinforcement Learning An Introduction
– volume: 40
  start-page: 433
  year: 2010
  ident: ref63
  article-title: Impedance learning for robotic contact tasks using natural actor-critic algorithm
  publication-title: IEEE Trans Syst Man Cybern B Cybern
  doi: 10.1109/TSMCB.2009.2026289
– ident: ref35
  doi: 10.1145/1390156.1390214
– ident: ref46
  doi: 10.1109/TSMC.1983.6313077
– ident: ref20
  doi: 10.1016/j.automatica.2009.07.008
– ident: ref67
  doi: 10.1109/TSMCB.2006.886173
– year: 1976
  ident: ref41
  publication-title: Gradient Optimization and Nonlinear Control
– ident: ref61
  doi: 10.1007/978-3-540-89722-4_9
– start-page: 878
  year: 2008
  ident: ref14
  article-title: A multi-agent reinforcement learning using actor-critic methods
  publication-title: Proc Int Conf Mach Learn Cybern
– start-page: 615
  year: 2010
  ident: ref31
  article-title: Finite-sample analysis of LSTD
  publication-title: Proc 27th Int Conf Mach Learn
– start-page: 327
  year: 1993
  ident: ref81
  article-title: Learning control under extreme uncertainty
  publication-title: Advances in Neural Information Processing Systems 5
– ident: ref28
  doi: 10.1109/9.580874
– year: 1994
  ident: ref24
  article-title: On-line Q-learning using connectionist systems
  publication-title: Technical Report CUED/F-INFENG/TR291
– year: 1989
  ident: ref21
  publication-title: Learning from delayed rewards
– year: 0
  ident: ref52
  article-title: Reinforcement learning for humanoid robotics
  publication-title: IEEE-RAS Int Conf on Hum Robot
– ident: ref9
  doi: 10.1287/ijoc.1080.0305
– ident: ref70
  doi: 10.1109/21.52551
– ident: ref17
  doi: 10.1016/S0167-6911(01)00152-9
– volume: 5
  start-page: 11
  year: 1968
  ident: ref36
  article-title: Stochastic optimization
  publication-title: Eng Cybern
– ident: ref85
  doi: 10.1007/s10514-009-9132-0
– ident: ref62
  doi: 10.1109/SICE.2008.4654995
– ident: ref38
  doi: 10.1016/j.neunet.2008.02.003
– ident: ref79
  doi: 10.1162/neco.2009.12-08-922
– ident: ref71
  doi: 10.1109/9.119632
– start-page: 1607
  year: 2010
  ident: ref44
  article-title: Relative entropy policy search
  publication-title: Proc 24th AAAI Conf Artif Intell
– ident: ref13
  doi: 10.1109/TAC.2009.2037462
– ident: ref64
  doi: 10.1016/j.neunet.2007.01.002
– ident: ref23
  doi: 10.1109/ACC.1994.735224
– ident: ref11
  doi: 10.1109/TNN.2006.889499
– start-page: 37
  year: 2008
  ident: ref55
  article-title: A consolidated actor-critic model with function approximation for high-dimensional POMDPs
  publication-title: Proc AAAI Workshop Adv POMDP Solvers
– year: 2010
  ident: ref15
  publication-title: Reinforcement Learning and Dynamic Programming Using Function Approximators
– ident: ref83
  doi: 10.1137/S036301299731669X
– ident: ref33
  doi: 10.1007/BF00992696
– ident: ref56
  doi: 10.1016/j.sysconle.2010.08.013
– ident: ref6
  doi: 10.1007/BF00115009
– ident: ref60
  doi: 10.1007/11596448_9
– ident: ref65
  doi: 10.1109/ROBOT.2010.5509751
– year: 1970
  ident: ref40
  publication-title: The Computation and Theory of Optimal Control (ser Mathematics in Science and Engineering vol 65)
– ident: ref1
  doi: 10.1137/S0363012901385691
– year: 2007
  ident: ref18
  publication-title: Dynamic Programming and Optimal Control Volume II
– ident: ref29
  doi: 10.1145/1390156.1390240
– ident: ref16
  doi: 10.1016/j.neucom.2007.11.026
– ident: ref54
  doi: 10.1016/j.ins.2007.03.012
– year: 1970
  ident: ref39
  publication-title: Differential Dynamic Programming (ser Modern Analytic and Computational Methods in Science and Mathematics vol 24)
– start-page: 105
  year: 2008
  ident: ref77
  article-title: Incremental natural actor-critic algorithms
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref26
  doi: 10.1016/B978-1-55860-377-6.50040-2
– ident: ref45
  doi: 10.1016/S0019-9958(77)90354-0
– ident: ref80
  doi: 10.1109/IJCNN.1992.287219
– ident: ref82
  doi: 10.1016/S0921-8890(97)00043-2
– start-page: 1312
  year: 2009
  ident: ref68
  article-title: A generalized natural actor-critic algorithm
  publication-title: Advances in Neural Information Processing Systems 22
– start-page: 465
  year: 2011
  ident: ref42
  article-title: PILCO: A model-based and data-efficient approach to policy search
  publication-title: Proc 28th Int Conf Mach Learn
– ident: ref22
  doi: 10.1007/BF00992698
– ident: ref50
  doi: 10.1109/CDC.2009.5400592
– start-page: 1169
  year: 2007
  ident: ref2
  article-title: Natural actor-critic for road traffic optimisation
  publication-title: Advances in Neural Information Processing Systems 19
– volume: 15
  start-page: 319
  year: 2001
  ident: ref4
  article-title: Infinite-horizon policy-gradient estimation
  publication-title: J Artif Intell Res
  doi: 10.1613/jair.806
– ident: ref32
  doi: 10.1016/0893-6080(90)90056-Q
– ident: ref66
  doi: 10.1109/TSMCB.2005.846001
– ident: ref3
  doi: 10.1023/A:1017936530646
– ident: ref5
  doi: 10.1109/TFUZZ.2003.814834
– ident: ref43
  doi: 10.1109/ADPRL.2007.368196
– ident: ref12
  doi: 10.1016/j.automatica.2010.02.018
– start-page: 1531
  year: 2001
  ident: ref51
  article-title: Natural policy gradient
  publication-title: Advances in Neural Information Processing Systems 14
– ident: ref25
  doi: 10.1016/B978-1-55860-377-6.50013-X
– ident: ref84
  doi: 10.1007/s10994-010-5223-6
SSID ssj0014493
Score 2.5693433
SecondaryResourceType review_article
Snippet Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to...
Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to...
SourceID hal
proquest
pascalfrancis
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1291
SubjectTerms Actor-critic
Algorithmics. Computability. Computer arithmetics
Algorithms
Applied sciences
Approximation algorithms
Approximation methods
Artificial intelligence
Automatic
Automatic Control Engineering
Computer Science
Computer science; control theory; systems
Control theory. Systems
Convergence
Engineering Sciences
Equations
Estimates
Exact sciences and technology
Learning
Machine Learning
natural gradient
Optimization
Policies
policy gradient
Power control
Reinforcement
reinforcement learning (RL)
Robotics
Searching
Theoretical computing
Title A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
URI https://ieeexplore.ieee.org/document/6392457
https://www.proquest.com/docview/1315674222
https://hal.science/hal-00756747
Volume 42
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB51e4ID0BZEgFYGcQCBt47jvLitVpQVYntgW9RbZDs2SK120W5SCX59x4-NykOIU6LEiZzM2P5mPPMNwEulKomTHqeMS0lFplsq60xQZYwWOjWF9VUi5qfF7Fx8vMgvduDtkAtjjPHBZ2bsTv1efrvSvXOVHeNqykVejmCEhlvI1Rp2DISoQzB9LWiBoGabIMPq47PFfDp1UVx8zHFFy10tiVuL0OibC4H0tVVcZKTc4M-xoarFHxO0X3VO7sN8298QbHI57js11j9_o3L83w96APci_CSToC97sGOW-3D3FinhPuzF4b4hryIn9esD-DIhi359bX6QlSUT5-inoUgC-Ww896r2bkYS6Vq_viOL6KMgeCCn0tN7kMBCTD6sfaBZt3kI5yfvz6YzGksyUJ0z3lGpLNor0hattm2qVWtbJSpmRV7jRJAVJlNMGERNquSikkXZFhWTznOFUEMqmT2C3eVqaR4DKZk1aI8yU4lcKMmdIZipNJWlrpWSJoF0K6NGR75yVzbjqvF2C6sbL9fGybWJck3gzfDM98DW8c_WL1D0Q0NHtD2bfGrcNYekCrS0rtMEDpzchlZRZAkc_aIpw31eIARCIzaB51vVaXDEum0YuTSrftOkWerejcDsyd_f_RTuuG6GjMdnsNute3OI0KdTR17nbwDM5v4T
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED9t4wF4ALaBCB_DIB5AkC5xHCfhraoYBdo-0A7tLbKd8yaBWtQmk-Cv5-yk0fgQ4ilR4kROzvb97nz3O4DnWueKFj0eRlypUCSmClWRiFAjGmFilNZXiZjO5PhUfDhLz3bgdZ8Lg4g--AwH7tTv5Vcr0zhX2TFpUy7SbBeukd5PeZut1e8ZCFG04fSFCCXBmm2KTFQcL-bT0cjFcfEBJ52WumoSV9TQ7oULgvTVVVxspNrQ77FtXYs_lmivd05uw3Tb4zbc5MugqfXA_PiNzPF_P-kO3OoAKBu2I2YfdnB5ADev0BIewH434TfsRcdK_fIQPg_ZvFlf4ne2smzoXP1hWyaBfULPvmq8o5F1hK3nb9i881IwOrCZ8gQfrOUhZu_WPtSs3tyF05O3i9E47IoyhCaNeB0qbcliUVZWxlax0ZWttMgjK9KCloJEYqIjgYSbdMZFrmRWyTxSzndFYENpldyDveVqifeBZZFFskgjzEUqtOLOFEx0HKvMFForDCDeyqg0HWO5K5zxtfSWS1SUXq6lk2vZyTWAV_0z31q-jn-2fkai7xs6qu3xcFK6aw5LSbK1LuMADp3c-ladyAI4-mWk9Pe5JBBEZmwAT7dDp6Q56zZi1BJXzaaMk9i9m6DZg7-_-wlcHy-mk3LyfvbxIdxwXW7zHx_BXr1u8DEBoVof-fH_EzTfAWw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Survey+of+Actor-Critic+Reinforcement+Learning%3A+Standard+and+Natural+Policy+Gradients&rft.jtitle=IEEE+transactions+on+systems%2C+man+and+cybernetics.+Part+C%2C+Applications+and+reviews&rft.au=GRONDMAN%2C+Ivo&rft.au=BUSONIU%2C+Lucian&rft.au=LOPES%2C+Gabriel+A.+D&rft.au=BABUSKA%2C+Robert&rft.date=2012-11-01&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1094-6977&rft.eissn=1558-2442&rft.volume=42&rft.issue=6&rft.spage=1291&rft.epage=1307&rft_id=info:doi/10.1109%2FTSMCC.2012.2218595&rft.externalDBID=n%2Fa&rft.externalDocID=26818899
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1094-6977&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1094-6977&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1094-6977&client=summon