A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics,...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 42; no. 6; pp. 1291 - 1307
Main Authors	Grondman, I., Busoniu, L., Lopes, G. A. D., Babuska, R.
Format	Journal Article
Language	English
Published	New-York, NY IEEE 01.11.2012 Institute of Electrical and Electronics Engineers
Subjects	Actor-critic Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Approximation algorithms Approximation methods Artificial intelligence Automatic Automatic Control Engineering Computer Science Computer science; control theory; systems Control theory. Systems Convergence Engineering Sciences Equations Estimates Exact sciences and technology Learning Machine Learning natural gradient Optimization Policies policy gradient Power control Reinforcement reinforcement learning (RL) Robotics Searching Theoretical computing Concept learning Action Gradient Finance Reinforcement learning Optimal policy policy gradient Algorithmics Actor-critic State space State space method Function approximation Robotics Variance reinforcement learning (RL) Search algorithm Gradient descent natural gradient Biomimetics Power control Learning algorithm Artificial intelligence actor-critic reinforcement learning
Online Access	Get full text

Cover

Loading…

Abstract	Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
AbstractList	Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues. Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to do policy search using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several standard and natural actor-critic algorithms follows and the paper concludes with an overview of application areas and a discussion on open issues.
Author	Grondman, I. Babuska, R. Lopes, G. A. D. Busoniu, L.
Author_xml	– sequence: 1 givenname: I. surname: Grondman fullname: Grondman, I. email: i.grondman@tudelft.nl organization: Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands – sequence: 2 givenname: L. surname: Busoniu fullname: Busoniu, L. email: lucian@busoniu.net organization: CRAN, Univ. de Lorraine, Vandoeuvre, France – sequence: 3 givenname: G. A. D. surname: Lopes fullname: Lopes, G. A. D. email: g.a.delgadolopesr@tudelft.nl organization: Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands – sequence: 4 givenname: R. surname: Babuska fullname: Babuska, R. email: r.buska@tudelft.nl organization: Delft Center for Syst. & Control, Delft Univ. of Technol., Delft, Netherlands
BackLink	http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=26818899$$DView record in Pascal Francis https://hal.science/hal-00756747$$DView record in HAL
BookMark	eNp90UFvFCEUB3BiamJb_QJ64WJiD7MCAwx420xsa7Laxq1eyRsGFDMLFdgm--2ddTd78ODpvZDfn3f4X6CzmKJD6DUlC0qJfv-w_tz3C0YoWzBGldDiGTqnQqiGcc7O5p1o3kjddS_QRSm_CKGc6_YcfV_i9TY_uR1OHi9tTbnpc6jB4q8uRJ-ydRsXK145yDHEHx_wukIcIY94HvgL1G2GCd-nKdgdvskwhpmXl-i5h6m4V8d5ib5df3zob5vV3c2nfrlqrCCsNjB4zhR4OVo_UjuMfhy4Ip4LrRRppWsHwp2kbOgYVyC7USoCjBBBFIMB2kt0dfj3J0zmMYcN5J1JEMztcmX2b4R0Qna8e6KzfXewjzn93rpSzSYU66YJokvbYmhL95QxNtO3RwrFwuQzRBvK6QCTiiql9ezYwdmcSsnOnwglZl-M-VuM2RdjjsXMIfVPyIYKNaRYM4Tp_9E3h2hwzp1uyVYzLrr2DzpanBY
CODEN	ITCRFH
CitedBy_id	crossref_primary_10_1109_TAES_2024_3418944 crossref_primary_10_1371_journal_pone_0158722 crossref_primary_10_1007_s12652_023_04685_8 crossref_primary_10_1007_s11276_023_03582_4 crossref_primary_10_1109_ACCESS_2023_3312021 crossref_primary_10_1007_s10462_023_10620_2 crossref_primary_10_1016_j_engstruct_2022_114385 crossref_primary_10_1145_3569576 crossref_primary_10_1039_D0CP06184K crossref_primary_10_1007_s11042_023_15932_7 crossref_primary_10_3390_math9060660 crossref_primary_10_1016_j_ifacol_2021_11_178 crossref_primary_10_1007_s00146_022_01569_x crossref_primary_10_1109_TGCN_2018_2801725 crossref_primary_10_1049_joe_2019_1193 crossref_primary_10_3390_electronics11111754 crossref_primary_10_1109_JIOT_2024_3378217 crossref_primary_10_1007_s43762_024_00127_z crossref_primary_10_1007_s10462_022_10299_x crossref_primary_10_1109_TNSE_2018_2813333 crossref_primary_10_3390_app14167084 crossref_primary_10_1002_nag_3923 crossref_primary_10_1016_j_jobe_2024_110491 crossref_primary_10_1177_1729881420916258 crossref_primary_10_1007_s12046_023_02201_4 crossref_primary_10_1016_j_apenergy_2020_116355 crossref_primary_10_1016_j_ins_2023_01_019 crossref_primary_10_1109_TNSE_2021_3068340 crossref_primary_10_1109_ACCESS_2024_3479774 crossref_primary_10_1109_TNNLS_2017_2651104 crossref_primary_10_1109_TMI_2019_2946345 crossref_primary_10_1016_j_comnet_2020_107646 crossref_primary_10_1016_j_ifacol_2021_10_353 crossref_primary_10_1145_3338123 crossref_primary_10_1080_01691864_2023_2229886 crossref_primary_10_1016_j_future_2022_11_022 crossref_primary_10_1007_s10458_020_09447_w crossref_primary_10_1016_j_jnca_2020_102781 crossref_primary_10_1016_j_vehcom_2023_100623 crossref_primary_10_3390_pr10112311 crossref_primary_10_1016_j_measen_2023_100939 crossref_primary_10_1109_MNET_011_2300032 crossref_primary_10_1016_j_cirpj_2023_05_008 crossref_primary_10_1016_j_knosys_2022_108304 crossref_primary_10_1109_TTE_2023_3322685 crossref_primary_10_3233_AIC_220316 crossref_primary_10_1109_TSMC_2024_3450601 crossref_primary_10_1145_3404191 crossref_primary_10_1016_j_trc_2024_104578 crossref_primary_10_1109_ACCESS_2024_3499378 crossref_primary_10_1109_TIFS_2020_3013200 crossref_primary_10_1016_j_neubiorev_2022_104977 crossref_primary_10_1109_JLT_2021_3123271 crossref_primary_10_1088_1742_6596_2767_3_032017 crossref_primary_10_1007_s10845_023_02179_0 crossref_primary_10_1007_s41066_016_0018_1 crossref_primary_10_1109_TCYB_2015_2481081 crossref_primary_10_1016_j_ins_2022_06_015 crossref_primary_10_1007_s10479_022_04572_z crossref_primary_10_1109_ACCESS_2020_2967626 crossref_primary_10_2139_ssrn_3990594 crossref_primary_10_1016_j_ifacol_2024_08_363 crossref_primary_10_1109_TSG_2024_3399705 crossref_primary_10_1109_TCCN_2024_3443265 crossref_primary_10_3390_app10114011 crossref_primary_10_1007_s13235_022_00449_9 crossref_primary_10_1007_s41315_019_00100_8 crossref_primary_10_1021_acs_jpclett_3c02771 crossref_primary_10_1007_s11721_017_0142_9 crossref_primary_10_1109_JIOT_2024_3355023 crossref_primary_10_1109_JAS_2024_124227 crossref_primary_10_1016_j_comnet_2024_110706 crossref_primary_10_1016_j_cie_2023_109631 crossref_primary_10_1109_TCST_2024_3401863 crossref_primary_10_1016_j_engappai_2023_107300 crossref_primary_10_1016_j_trc_2023_104019 crossref_primary_10_1016_j_comcom_2021_07_014 crossref_primary_10_1080_00207543_2022_2104180 crossref_primary_10_4995_riai_2019_10379 crossref_primary_10_1007_s11571_024_10137_6 crossref_primary_10_3390_s25020388 crossref_primary_10_2514_1_J062801 crossref_primary_10_1016_j_neunet_2014_09_003 crossref_primary_10_1109_TCCN_2019_2941191 crossref_primary_10_1002_ett_4427 crossref_primary_10_1016_j_neunet_2018_07_018 crossref_primary_10_1109_TMC_2022_3188473 crossref_primary_10_1016_j_ins_2022_01_047 crossref_primary_10_1016_j_neucom_2018_05_061 crossref_primary_10_1016_j_rcim_2024_102857 crossref_primary_10_1109_JIOT_2022_3168869 crossref_primary_10_1109_TVT_2019_2922668 crossref_primary_10_1016_j_eswa_2019_112963 crossref_primary_10_1109_TVT_2022_3171817 crossref_primary_10_1088_1361_6633_aab406 crossref_primary_10_1109_TSE_2024_3397822 crossref_primary_10_1016_j_trc_2017_09_020 crossref_primary_10_1109_TII_2024_3424529 crossref_primary_10_1145_3699731 crossref_primary_10_1016_j_inffus_2018_11_020 crossref_primary_10_1016_j_engappai_2018_11_006 crossref_primary_10_1109_JIOT_2023_3324392 crossref_primary_10_1109_TVLSI_2023_3321532 crossref_primary_10_1007_s12243_024_01018_4 crossref_primary_10_1109_TVLSI_2023_3321536 crossref_primary_10_1007_s10614_020_10038_w crossref_primary_10_1016_j_neucom_2021_10_004 crossref_primary_10_1080_14697688_2021_2001032 crossref_primary_10_3390_en17184623 crossref_primary_10_1109_LCSYS_2020_2979635 crossref_primary_10_1109_TASE_2023_3319510 crossref_primary_10_3233_JIFS_210032 crossref_primary_10_1016_j_entcom_2020_100357 crossref_primary_10_1109_JIOT_2020_3004394 crossref_primary_10_1145_3643862 crossref_primary_10_1016_j_comnet_2022_109032 crossref_primary_10_1109_TCYB_2019_2939174 crossref_primary_10_1016_j_advengsoft_2023_103487 crossref_primary_10_1109_JIOT_2022_3219202 crossref_primary_10_1109_TNSM_2023_3243837 crossref_primary_10_1016_j_mechatronics_2014_10_005 crossref_primary_10_1109_ACCESS_2024_3384460 crossref_primary_10_1109_TII_2022_3177415 crossref_primary_10_1016_j_neunet_2019_01_011 crossref_primary_10_1109_ACCESS_2019_2924030 crossref_primary_10_5050_KSNVE_2022_32_2_124 crossref_primary_10_1007_s12369_015_0314_y crossref_primary_10_3390_fi14090256 crossref_primary_10_1109_JIOT_2024_3434641 crossref_primary_10_1007_s42401_022_00169_3 crossref_primary_10_1016_j_actaastro_2021_07_010 crossref_primary_10_1109_TCSS_2021_3100291 crossref_primary_10_1007_s10458_020_09455_w crossref_primary_10_1109_TII_2015_2404299 crossref_primary_10_1109_ACCESS_2019_2914469 crossref_primary_10_1016_j_arcontrol_2018_09_005 crossref_primary_10_1049_rpg2_12782 crossref_primary_10_3390_info15050272 crossref_primary_10_1109_LWC_2020_3030695 crossref_primary_10_3390_computers11070104 crossref_primary_10_1103_PhysRevFluids_7_023103 crossref_primary_10_46740_alku_1390397 crossref_primary_10_1016_j_ins_2024_120182 crossref_primary_10_1007_s11831_024_10196_2 crossref_primary_10_1016_j_ast_2020_106013 crossref_primary_10_1016_j_comnet_2019_05_013 crossref_primary_10_1016_j_trc_2020_102662 crossref_primary_10_1017_S0269888921000023 crossref_primary_10_1007_s10479_024_06284_y crossref_primary_10_1080_00207217_2019_1600740 crossref_primary_10_3390_math10142523 crossref_primary_10_1016_j_neucom_2022_03_036 crossref_primary_10_1007_s10696_024_09579_1 crossref_primary_10_3390_systems10050180 crossref_primary_10_1016_j_arcontrol_2021_04_001 crossref_primary_10_1088_1755_1315_1101_9_092027 crossref_primary_10_1016_j_trip_2021_100425 crossref_primary_10_1007_s10846_021_01389_z crossref_primary_10_1109_TWC_2014_022014_130840 crossref_primary_10_3390_ai1020019 crossref_primary_10_1007_s12652_018_0819_y crossref_primary_10_3390_electronics13030555 crossref_primary_10_1016_j_ress_2021_107530 crossref_primary_10_3390_s20226595 crossref_primary_10_1016_j_optlaseng_2023_107923 crossref_primary_10_1088_1361_6560_ab843e crossref_primary_10_3390_sym16081030 crossref_primary_10_1016_j_eswa_2023_121421 crossref_primary_10_1016_j_ress_2023_109231 crossref_primary_10_1017_bca_2019_17 crossref_primary_10_1109_JSYST_2019_2958890 crossref_primary_10_1049_joe_2019_1200 crossref_primary_10_1007_s12530_024_09587_4 crossref_primary_10_1109_TCYB_2015_2483780 crossref_primary_10_3390_math11061419 crossref_primary_10_1109_JSAIT_2023_3287929 crossref_primary_10_1007_s00500_018_3063_7 crossref_primary_10_1002_cjce_24508 crossref_primary_10_1007_s10994_015_5484_1 crossref_primary_10_3389_fmars_2025_1534622 crossref_primary_10_1109_JIOT_2024_3498322 crossref_primary_10_1109_TGCN_2021_3049500 crossref_primary_10_1155_2022_9017079 crossref_primary_10_1016_j_trc_2018_10_024 crossref_primary_10_1021_acs_jpca_0c04473 crossref_primary_10_1109_ACCESS_2020_3027152 crossref_primary_10_1016_j_ifacol_2018_11_115 crossref_primary_10_1016_j_engappai_2021_104398 crossref_primary_10_3390_s19071547 crossref_primary_10_1016_j_energy_2024_133912 crossref_primary_10_1109_TAC_2015_2458491 crossref_primary_10_1145_3477600 crossref_primary_10_1021_acs_jcim_2c00373 crossref_primary_10_3390_electronics9122200 crossref_primary_10_1109_JIOT_2020_2988033 crossref_primary_10_1016_j_matcom_2022_02_016 crossref_primary_10_1109_TBIOM_2022_3188825 crossref_primary_10_1016_j_isatra_2024_12_045 crossref_primary_10_1016_j_artmed_2023_102736 crossref_primary_10_1016_j_adhoc_2023_103193 crossref_primary_10_1016_j_enbuild_2024_114189 crossref_primary_10_1109_LRA_2020_2972862 crossref_primary_10_1016_j_neucom_2024_127954 crossref_primary_10_1109_JIOT_2024_3361772 crossref_primary_10_1109_TPDS_2019_2893648 crossref_primary_10_1109_TIV_2019_2904417 crossref_primary_10_3390_electronics12204324 crossref_primary_10_1016_j_enbuild_2024_114065 crossref_primary_10_1109_ACCESS_2020_3006254 crossref_primary_10_1109_TAC_2020_2995814 crossref_primary_10_7554_eLife_21492 crossref_primary_10_7554_eLife_56212 crossref_primary_10_1109_TMC_2023_3270242 crossref_primary_10_1007_s13198_021_01147_2 crossref_primary_10_1016_j_segan_2023_101109 crossref_primary_10_1016_j_engappai_2022_105329 crossref_primary_10_1016_j_neucom_2024_127716 crossref_primary_10_1016_j_artint_2017_12_001 crossref_primary_10_3390_e24101357 crossref_primary_10_1007_s00521_023_08760_1 crossref_primary_10_1007_s12559_017_9511_3 crossref_primary_10_1109_TNNLS_2015_2442233 crossref_primary_10_1016_j_ensm_2024_103612 crossref_primary_10_1109_TNNLS_2016_2558295 crossref_primary_10_1111_mice_12995 crossref_primary_10_1016_j_neucom_2021_01_096 crossref_primary_10_1109_ACCESS_2024_3358203 crossref_primary_10_1109_TNNLS_2020_2977924 crossref_primary_10_1109_ACCESS_2020_2970433 crossref_primary_10_1109_TSMC_2024_3488205 crossref_primary_10_1109_TNNLS_2017_2702566 crossref_primary_10_1016_j_energy_2024_132344 crossref_primary_10_1016_j_ijepes_2020_106145 crossref_primary_10_1016_j_procs_2025_01_206 crossref_primary_10_1109_TETC_2021_3115793 crossref_primary_10_1109_TSG_2024_3404859 crossref_primary_10_1007_s10458_021_09514_w crossref_primary_10_1016_j_measurement_2023_113786 crossref_primary_10_1109_TII_2024_3459025 crossref_primary_10_1007_s10845_025_02580_x crossref_primary_10_1007_s10845_023_02085_5 crossref_primary_10_1016_j_trc_2020_102738 crossref_primary_10_1109_JIOT_2020_3003449 crossref_primary_10_2514_1_I011150 crossref_primary_10_1109_JSEN_2024_3369038 crossref_primary_10_1109_TVT_2018_2820002 crossref_primary_10_1155_2023_3837615 crossref_primary_10_1111_mafi_12388 crossref_primary_10_1109_COMST_2016_2539923 crossref_primary_10_1016_j_ifacol_2020_12_1918 crossref_primary_10_1016_j_neucom_2025_130044 crossref_primary_10_1039_D2RA06022A crossref_primary_10_3390_app112311335 crossref_primary_10_1016_j_future_2023_09_018 crossref_primary_10_1016_j_trc_2020_102829 crossref_primary_10_1016_j_eswa_2021_115127 crossref_primary_10_1109_TIM_2023_3289531 crossref_primary_10_1007_s00245_024_10138_1 crossref_primary_10_1109_TVT_2020_3018817 crossref_primary_10_1016_j_trc_2020_102949 crossref_primary_10_1017_S089006042100007X crossref_primary_10_1016_j_jnca_2017_01_026 crossref_primary_10_1049_itr2_12120 crossref_primary_10_1016_j_mechmachtheory_2024_105676 crossref_primary_10_1016_j_neucom_2023_02_008 crossref_primary_10_1109_ACCESS_2018_2878853 crossref_primary_10_1109_TPAMI_2019_2956699 crossref_primary_10_1007_s12145_021_00572_y crossref_primary_10_1145_3539223 crossref_primary_10_1007_s10462_023_10450_2 crossref_primary_10_1016_j_knosys_2023_110485 crossref_primary_10_1109_ACCESS_2020_2978254 crossref_primary_10_3390_drones8070338 crossref_primary_10_1017_S0269888923000012 crossref_primary_10_1109_TMC_2021_3107458 crossref_primary_10_3182_20140824_6_ZA_1003_01705 crossref_primary_10_1109_TWC_2022_3216342 crossref_primary_10_1002_cpe_7351 crossref_primary_10_1016_j_compeleceng_2022_107989 crossref_primary_10_1007_s12555_023_0342_6 crossref_primary_10_1016_j_aei_2018_08_002 crossref_primary_10_1016_j_renene_2021_11_052 crossref_primary_10_1002_ett_4929 crossref_primary_10_1016_j_multra_2024_100164 crossref_primary_10_1007_s10462_021_09997_9 crossref_primary_10_1109_TITS_2019_2962338 crossref_primary_10_1016_j_compag_2024_108613 crossref_primary_10_1016_j_ress_2024_110118 crossref_primary_10_1109_ACCESS_2025_3537859 crossref_primary_10_1109_TWC_2021_3120266 crossref_primary_10_1109_TWC_2023_3309688 crossref_primary_10_1109_JIOT_2019_2951509 crossref_primary_10_3233_ICA_230707 crossref_primary_10_1016_j_jnca_2023_103604 crossref_primary_10_1021_jacs_2c13467 crossref_primary_10_1371_journal_pcbi_1007452 crossref_primary_10_3233_JIFS_17052 crossref_primary_10_1007_s10489_022_04227_3 crossref_primary_10_1177_01423312241254590 crossref_primary_10_1109_TNNLS_2021_3116063 crossref_primary_10_1016_j_enbuild_2022_112584 crossref_primary_10_1016_j_pecs_2021_100967 crossref_primary_10_1016_j_jhydrol_2020_124923 crossref_primary_10_1109_TMC_2023_3260086 crossref_primary_10_1021_ie4031743 crossref_primary_10_3390_en17030616 crossref_primary_10_1109_THMS_2019_2912447 crossref_primary_10_1007_s00500_018_3225_7 crossref_primary_10_1016_j_mechatronics_2014_08_001 crossref_primary_10_1080_00031305_2022_2129787 crossref_primary_10_1007_s11063_024_11640_x crossref_primary_10_1109_TCYB_2019_2927410 crossref_primary_10_1111_mice_12956 crossref_primary_10_3390_app10124236 crossref_primary_10_1287_ijoc_2016_0733 crossref_primary_10_1016_j_apenergy_2023_121947 crossref_primary_10_1109_JIOT_2022_3161680 crossref_primary_10_1109_TNNLS_2022_3191021 crossref_primary_10_1109_TNSE_2022_3224028 crossref_primary_10_1109_ACCESS_2020_3037940 crossref_primary_10_1016_j_physd_2020_132620 crossref_primary_10_1016_j_trc_2020_102626 crossref_primary_10_1016_j_infsof_2023_107325 crossref_primary_10_34133_research_0064 crossref_primary_10_1016_j_compeleceng_2021_107117 crossref_primary_10_3390_make5040082 crossref_primary_10_1016_j_ifacol_2023_10_198 crossref_primary_10_1109_TMC_2020_3040945 crossref_primary_10_1109_TAC_2023_3264176 crossref_primary_10_1002_aic_18611 crossref_primary_10_1007_s11063_020_10241_8 crossref_primary_10_1080_00207179_2021_1913516 crossref_primary_10_1016_j_cose_2023_103278 crossref_primary_10_1016_j_knosys_2021_106781 crossref_primary_10_1061_JWRMD5_WRENG_6089 crossref_primary_10_1108_CI_10_2022_0278 crossref_primary_10_1109_TWC_2017_2769644 crossref_primary_10_1109_ACCESS_2019_2930803 crossref_primary_10_3390_e25071007 crossref_primary_10_3390_en14206743 crossref_primary_10_1049_itr2_12386 crossref_primary_10_1016_j_addma_2024_104121 crossref_primary_10_1016_j_chaos_2024_114712 crossref_primary_10_1016_j_ifacol_2023_02_028 crossref_primary_10_1016_j_jksuci_2022_05_001 crossref_primary_10_2478_jaiscr_2023_0019 crossref_primary_10_1109_TWC_2024_3367034 crossref_primary_10_3390_electronics12234845 crossref_primary_10_1109_TNSM_2020_3047765 crossref_primary_10_1109_ACCESS_2021_3068129 crossref_primary_10_1002_amp2_10079 crossref_primary_10_1017_hpl_2023_47 crossref_primary_10_1109_TIA_2020_2986186 crossref_primary_10_1016_j_eswa_2023_119910 crossref_primary_10_1016_j_ifacol_2021_08_168 crossref_primary_10_1109_ACCESS_2022_3227450 crossref_primary_10_1109_TVCG_2021_3076749 crossref_primary_10_1007_s42064_024_0212_x crossref_primary_10_1109_TVT_2024_3443263 crossref_primary_10_3390_s22010270 crossref_primary_10_1109_TMC_2024_3421541 crossref_primary_10_1016_j_asr_2022_08_002 crossref_primary_10_1109_ACCESS_2024_3365270 crossref_primary_10_1038_s41586_022_04894_9 crossref_primary_10_1109_TVT_2021_3103416 crossref_primary_10_1109_ACCESS_2025_3526871 crossref_primary_10_1109_TSG_2020_3037066 crossref_primary_10_1109_TGCN_2021_3132561 crossref_primary_10_1016_j_jmsy_2024_10_026 crossref_primary_10_1016_j_comnet_2023_110016 crossref_primary_10_48295_ET_2022_86_1 crossref_primary_10_1007_s11276_019_02117_0 crossref_primary_10_1016_j_comnet_2024_110878 crossref_primary_10_1016_j_procs_2023_12_009 crossref_primary_10_1021_acs_jctc_3c00528 crossref_primary_10_1109_TNSE_2020_2978856 crossref_primary_10_1007_s11036_024_02311_1 crossref_primary_10_1016_j_eng_2021_04_027 crossref_primary_10_1109_JIOT_2021_3084923 crossref_primary_10_1007_s13042_023_01981_9 crossref_primary_10_1016_j_jpdc_2022_03_016 crossref_primary_10_1109_TNNLS_2021_3056418 crossref_primary_10_1016_j_asoc_2023_111153 crossref_primary_10_1016_j_isatra_2021_06_010 crossref_primary_10_1016_j_engappai_2024_108695 crossref_primary_10_1109_COMST_2019_2924243 crossref_primary_10_1016_j_patcog_2022_108875 crossref_primary_10_3233_ICA_160531 crossref_primary_10_1016_j_iot_2022_100597 crossref_primary_10_5604_01_3001_0054_6282 crossref_primary_10_1016_j_eswa_2024_123592 crossref_primary_10_1109_TSMC_2020_3042876 crossref_primary_10_1002_acs_3958 crossref_primary_10_1088_2632_2153_ad5f73 crossref_primary_10_1016_j_aei_2021_101315 crossref_primary_10_1109_TCYB_2016_2623859 crossref_primary_10_54097_hset_v39i_6725 crossref_primary_10_1016_j_isatra_2020_02_017 crossref_primary_10_1155_2020_8854837 crossref_primary_10_2139_ssrn_4599800 crossref_primary_10_1016_j_cie_2024_110633 crossref_primary_10_3390_rs15071842 crossref_primary_10_1002_aisy_202200462 crossref_primary_10_1016_j_engappai_2024_108506 crossref_primary_10_1785_0220230118 crossref_primary_10_1109_TNSE_2025_3528190 crossref_primary_10_35784_jcsi_3579 crossref_primary_10_1016_j_iswa_2025_200485 crossref_primary_10_1016_j_ress_2025_111018 crossref_primary_10_4236_jqis_2019_91001 crossref_primary_10_1109_ACCESS_2020_2979323 crossref_primary_10_1109_TGCN_2022_3190007 crossref_primary_10_23919_ICN_2024_0007 crossref_primary_10_1016_j_compmedimag_2023_102201 crossref_primary_10_1016_j_procs_2021_10_023 crossref_primary_10_1016_j_ifacol_2018_07_308 crossref_primary_10_1007_s12541_020_00315_x crossref_primary_10_1007_s10586_022_03957_w crossref_primary_10_3390_su15097087 crossref_primary_10_1016_j_neucom_2019_11_032 crossref_primary_10_1109_TCDS_2023_3296166 crossref_primary_10_1080_08839514_2024_2383101 crossref_primary_10_1016_j_asoc_2020_106099 crossref_primary_10_1049_ipr2_12930 crossref_primary_10_1109_TCYB_2014_2343194 crossref_primary_10_1109_TWC_2021_3073623 crossref_primary_10_1016_j_array_2024_100365 crossref_primary_10_1109_TKDE_2023_3341430 crossref_primary_10_3390_technologies12120259 crossref_primary_10_1016_j_robot_2024_104914 crossref_primary_10_1109_JIOT_2022_3174469 crossref_primary_10_1109_TNNLS_2021_3090570 crossref_primary_10_1109_TSMC_2015_2417510 crossref_primary_10_1109_TWC_2024_3452689 crossref_primary_10_3390_app14072916 crossref_primary_10_1109_TVCG_2020_3030423 crossref_primary_10_1016_j_scs_2019_101748 crossref_primary_10_1016_j_arcontrol_2022_09_003 crossref_primary_10_1680_jtran_17_00085 crossref_primary_10_2139_ssrn_3748130 crossref_primary_10_1145_3712292 crossref_primary_10_1016_j_comnet_2021_107875 crossref_primary_10_1109_TCCN_2021_3130995 crossref_primary_10_1016_j_aei_2022_101818 crossref_primary_10_1109_COMST_2024_3395414 crossref_primary_10_1016_j_ins_2023_02_079 crossref_primary_10_1016_j_neucom_2016_08_155 crossref_primary_10_3390_math11020437 crossref_primary_10_1016_j_jairtraman_2023_102397 crossref_primary_10_1109_TEM_2022_3166769 crossref_primary_10_1016_j_isatra_2023_08_005 crossref_primary_10_1109_TCCN_2023_3306363 crossref_primary_10_1016_j_ins_2023_01_042 crossref_primary_10_3390_drones7080513 crossref_primary_10_1109_JAS_2023_123009 crossref_primary_10_1016_j_neucom_2024_128255 crossref_primary_10_1016_j_engappai_2019_04_008
Cites_doi	10.2200/S00268ED1V01Y201005AIM009 10.1007/BF00114724 10.1613/jair.301 10.1147/rd.33.0210 10.1007/BF00114723 10.1109/ICASSP.1998.675489 10.1007/978-3-540-87481-2_6 10.1109/CDC.2001.980135 10.1016/S0005-1098(99)00099-0 10.1007/978-3-540-87481-2_5 10.1162/089976698300017746 10.1109/ICMLC.2004.1378544 10.1109/COEC.2003.1210269 10.1109/ACC.2006.1656451 10.1109/TSMCB.2009.2026289 10.1145/1390156.1390214 10.1109/TSMC.1983.6313077 10.1016/j.automatica.2009.07.008 10.1109/TSMCB.2006.886173 10.1007/978-3-540-89722-4_9 10.1109/9.580874 10.1287/ijoc.1080.0305 10.1109/21.52551 10.1016/S0167-6911(01)00152-9 10.1007/s10514-009-9132-0 10.1109/SICE.2008.4654995 10.1016/j.neunet.2008.02.003 10.1162/neco.2009.12-08-922 10.1109/9.119632 10.1109/TAC.2009.2037462 10.1016/j.neunet.2007.01.002 10.1109/ACC.1994.735224 10.1109/TNN.2006.889499 10.1137/S036301299731669X 10.1007/BF00992696 10.1016/j.sysconle.2010.08.013 10.1007/BF00115009 10.1007/11596448_9 10.1109/ROBOT.2010.5509751 10.1137/S0363012901385691 10.1145/1390156.1390240 10.1016/j.neucom.2007.11.026 10.1016/j.ins.2007.03.012 10.1016/B978-1-55860-377-6.50040-2 10.1016/S0019-9958(77)90354-0 10.1109/IJCNN.1992.287219 10.1016/S0921-8890(97)00043-2 10.1007/BF00992698 10.1109/CDC.2009.5400592 10.1613/jair.806 10.1016/0893-6080(90)90056-Q 10.1109/TSMCB.2005.846001 10.1023/A:1017936530646 10.1109/TFUZZ.2003.814834 10.1109/ADPRL.2007.368196 10.1016/j.automatica.2010.02.018 10.1016/B978-1-55860-377-6.50013-X 10.1007/s10994-010-5223-6
ContentType	Journal Article
Copyright	2014 INIST-CNRS Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: 2014 INIST-CNRS – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	97E RIA RIE AAYXX CITATION IQODW 7SC 7SP 7TB 8FD F28 FR3 JQ2 L7M L~C L~D 1XC VOOES
DOI	10.1109/TSMCC.2012.2218595
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Pascal-Francis Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Sciences (General) Applied Sciences Computer Science
EISSN	1558-2442
EndPage	1307
ExternalDocumentID	oai_HAL_hal_00756747v1 26818899 10_1109_TSMCC_2012_2218595 6392457
Genre	orig-research
GroupedDBID	-~X 0R~ 29I 4.4 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AETIX AGQYO AGSQL AHBIQ AI. AIBXA ALLEH ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD F5P HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL PZZ RIA RIE RNS VH1 AAYOK AAYXX CITATION RIG ATWAV IPNFZ IQODW XFK 7SC 7SP 7TB 8FD F28 FR3 JQ2 L7M L~C L~D 1XC VOOES
ID	FETCH-LOGICAL-c502t-abf428af6dcfd1cbdfdb480f45988036e3b04e612b7248a67d680a2005082aba3
IEDL.DBID	RIE
ISSN	1094-6977
IngestDate	Fri May 09 12:24:27 EDT 2025 Sun Aug 24 04:04:32 EDT 2025 Tue Sep 20 22:33:43 EDT 2022 Tue Jul 01 03:52:42 EDT 2025 Thu Apr 24 22:51:13 EDT 2025 Tue Aug 26 17:18:15 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	6
Keywords	Concept learning Action Gradient Finance Reinforcement learning Optimal policy policy gradient Algorithmics Actor-critic State space State space method Function approximation Robotics Variance reinforcement learning (RL) Search algorithm Gradient descent natural gradient Biomimetics Power control Learning algorithm Artificial intelligence actor-critic reinforcement learning
Language	English
License	CC BY 4.0 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c502t-abf428af6dcfd1cbdfdb480f45988036e3b04e612b7248a67d680a2005082aba3
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
OpenAccessLink	https://hal.science/hal-00756747
PQID	1315674222
PQPubID	23500
PageCount	17
ParticipantIDs	proquest_miscellaneous_1315674222 crossref_primary_10_1109_TSMCC_2012_2218595 ieee_primary_6392457 hal_primary_oai_HAL_hal_00756747v1 pascalfrancis_primary_26818899 crossref_citationtrail_10_1109_TSMCC_2012_2218595
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2012-11-01
PublicationDateYYYYMMDD	2012-11-01
PublicationDate_xml	– month: 11 year: 2012 text: 2012-11-01 day: 01
PublicationDecade	2010
PublicationPlace	New-York, NY
PublicationPlace_xml	– name: New-York, NY
PublicationTitle	IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews
PublicationTitleAbbrev	TSMCC
PublicationYear	2012
Publisher	IEEE Institute of Electrical and Electronics Engineers
Publisher_xml	– name: IEEE – name: Institute of Electrical and Electronics Engineers
References	ref13 ref56 ref12 li (ref14) 2008 ref59 bhatnagar (ref77) 2008 ref58 glynn (ref37) 1987 ref11 ref54 ref10 peters (ref44) 2010 schoknecht (ref30) 2003 li (ref57) 2009 ref17 ref16 ref50 kaelbling (ref8) 1996; 4 ref46 ref45 ref47 hasdorff (ref41) 1976 ref85 ref43 ref49 niedzwiedz (ref55) 2008 ref9 ref3 gullapalli (ref81) 1993 ref6 baird (ref48) 1999 ref5 ref82 ref84 ref83 ref80 ref79 ref35 ref78 kakade (ref51) 2001 ref75 ref33 ref32 lazaric (ref31) 2010 ref1 cheng (ref53) 2004 ref38 peters (ref52) 0 dyer (ref40) 1970 ref71 ref70 melo (ref76) 2008 ref73 bu?oniu (ref15) 2010 ref72 morimura (ref68) 2009 rummery (ref24) 1994 sutton (ref7) 1998 ref67 ref23 ref26 ref69 ref25 jacobson (ref39) 1970 ref64 ref20 aleksandrov (ref36) 1968; 5 ref66 ref22 ref65 watkins (ref21) 1989 ref28 ref27 bagnell (ref74) 2003 bagnell (ref34) 2003 ref29 bertsekas (ref18) 2007 ref60 ref62 sutton (ref19) 2000 ref61 richter (ref2) 2007 deisenroth (ref42) 2011 baxter (ref4) 2001; 15 kim (ref63) 2010; 40
References_xml	– ident: ref10 doi: 10.2200/S00268ED1V01Y201005AIM009 – start-page: 1057 year: 2000 ident: ref19 article-title: Policy gradient methods for reinforcement learning with function approximation publication-title: Advances in Neural Information Processing Systems 12 – ident: ref27 doi: 10.1007/BF00114724 – volume: 4 start-page: 237 year: 1996 ident: ref8 article-title: Reinforcement learning: A survey publication-title: J Artif Intell Res doi: 10.1613/jair.301 – ident: ref69 doi: 10.1147/rd.33.0210 – start-page: 1019 year: 2003 ident: ref74 article-title: Covariant policy search publication-title: Proc 18th Int Joint Conf Artif Intell – ident: ref47 doi: 10.1007/BF00114723 – ident: ref73 doi: 10.1109/ICASSP.1998.675489 – ident: ref78 doi: 10.1007/978-3-540-87481-2_6 – ident: ref58 doi: 10.1109/CDC.2001.980135 – ident: ref49 doi: 10.1016/S0005-1098(99)00099-0 – start-page: 66 year: 2008 ident: ref76 article-title: Fitted natural actor-critic: A new algorithm for continuous state-action MDPs publication-title: Proc Eur Conf Mach Learn Knowl Discovery Databases doi: 10.1007/978-3-540-87481-2_5 – ident: ref75 doi: 10.1162/089976698300017746 – start-page: 368 year: 2009 ident: ref57 article-title: Urban traffic signal learning control using fuzzy actor-critic methods publication-title: Proc 5th Int Conf Natural Comput – start-page: 2985 year: 2004 ident: ref53 article-title: Application of actor-critic learning to adaptive state space construction publication-title: Proc 3rd Int Conf Mach Learn Cybern doi: 10.1109/ICMLC.2004.1378544 – start-page: 1555 year: 2003 ident: ref30 article-title: Optimality of reinforcement learning algorithms with linear function approximation publication-title: Advances in Neural Information Processing Systems 15 – start-page: 366 year: 1987 ident: ref37 article-title: Likelihood ratio gradient estimation: An overview publication-title: Proc Winter Simul Conf – year: 2003 ident: ref34 article-title: Policy search in kernel Hilbert space – ident: ref59 doi: 10.1109/COEC.2003.1210269 – year: 1999 ident: ref48 article-title: Gradient descent for general reinforcement learning publication-title: Advances in Neural Information Processing Systems 11 – ident: ref72 doi: 10.1109/ACC.2006.1656451 – year: 1998 ident: ref7 publication-title: Reinforcement Learning An Introduction – volume: 40 start-page: 433 year: 2010 ident: ref63 article-title: Impedance learning for robotic contact tasks using natural actor-critic algorithm publication-title: IEEE Trans Syst Man Cybern B Cybern doi: 10.1109/TSMCB.2009.2026289 – ident: ref35 doi: 10.1145/1390156.1390214 – ident: ref46 doi: 10.1109/TSMC.1983.6313077 – ident: ref20 doi: 10.1016/j.automatica.2009.07.008 – ident: ref67 doi: 10.1109/TSMCB.2006.886173 – year: 1976 ident: ref41 publication-title: Gradient Optimization and Nonlinear Control – ident: ref61 doi: 10.1007/978-3-540-89722-4_9 – start-page: 878 year: 2008 ident: ref14 article-title: A multi-agent reinforcement learning using actor-critic methods publication-title: Proc Int Conf Mach Learn Cybern – start-page: 615 year: 2010 ident: ref31 article-title: Finite-sample analysis of LSTD publication-title: Proc 27th Int Conf Mach Learn – start-page: 327 year: 1993 ident: ref81 article-title: Learning control under extreme uncertainty publication-title: Advances in Neural Information Processing Systems 5 – ident: ref28 doi: 10.1109/9.580874 – year: 1994 ident: ref24 article-title: On-line Q-learning using connectionist systems publication-title: Technical Report CUED/F-INFENG/TR291 – year: 1989 ident: ref21 publication-title: Learning from delayed rewards – year: 0 ident: ref52 article-title: Reinforcement learning for humanoid robotics publication-title: IEEE-RAS Int Conf on Hum Robot – ident: ref9 doi: 10.1287/ijoc.1080.0305 – ident: ref70 doi: 10.1109/21.52551 – ident: ref17 doi: 10.1016/S0167-6911(01)00152-9 – volume: 5 start-page: 11 year: 1968 ident: ref36 article-title: Stochastic optimization publication-title: Eng Cybern – ident: ref85 doi: 10.1007/s10514-009-9132-0 – ident: ref62 doi: 10.1109/SICE.2008.4654995 – ident: ref38 doi: 10.1016/j.neunet.2008.02.003 – ident: ref79 doi: 10.1162/neco.2009.12-08-922 – ident: ref71 doi: 10.1109/9.119632 – start-page: 1607 year: 2010 ident: ref44 article-title: Relative entropy policy search publication-title: Proc 24th AAAI Conf Artif Intell – ident: ref13 doi: 10.1109/TAC.2009.2037462 – ident: ref64 doi: 10.1016/j.neunet.2007.01.002 – ident: ref23 doi: 10.1109/ACC.1994.735224 – ident: ref11 doi: 10.1109/TNN.2006.889499 – start-page: 37 year: 2008 ident: ref55 article-title: A consolidated actor-critic model with function approximation for high-dimensional POMDPs publication-title: Proc AAAI Workshop Adv POMDP Solvers – year: 2010 ident: ref15 publication-title: Reinforcement Learning and Dynamic Programming Using Function Approximators – ident: ref83 doi: 10.1137/S036301299731669X – ident: ref33 doi: 10.1007/BF00992696 – ident: ref56 doi: 10.1016/j.sysconle.2010.08.013 – ident: ref6 doi: 10.1007/BF00115009 – ident: ref60 doi: 10.1007/11596448_9 – ident: ref65 doi: 10.1109/ROBOT.2010.5509751 – year: 1970 ident: ref40 publication-title: The Computation and Theory of Optimal Control (ser Mathematics in Science and Engineering vol 65) – ident: ref1 doi: 10.1137/S0363012901385691 – year: 2007 ident: ref18 publication-title: Dynamic Programming and Optimal Control Volume II – ident: ref29 doi: 10.1145/1390156.1390240 – ident: ref16 doi: 10.1016/j.neucom.2007.11.026 – ident: ref54 doi: 10.1016/j.ins.2007.03.012 – year: 1970 ident: ref39 publication-title: Differential Dynamic Programming (ser Modern Analytic and Computational Methods in Science and Mathematics vol 24) – start-page: 105 year: 2008 ident: ref77 article-title: Incremental natural actor-critic algorithms publication-title: Proc Adv Neural Inf Process Syst – ident: ref26 doi: 10.1016/B978-1-55860-377-6.50040-2 – ident: ref45 doi: 10.1016/S0019-9958(77)90354-0 – ident: ref80 doi: 10.1109/IJCNN.1992.287219 – ident: ref82 doi: 10.1016/S0921-8890(97)00043-2 – start-page: 1312 year: 2009 ident: ref68 article-title: A generalized natural actor-critic algorithm publication-title: Advances in Neural Information Processing Systems 22 – start-page: 465 year: 2011 ident: ref42 article-title: PILCO: A model-based and data-efficient approach to policy search publication-title: Proc 28th Int Conf Mach Learn – ident: ref22 doi: 10.1007/BF00992698 – ident: ref50 doi: 10.1109/CDC.2009.5400592 – start-page: 1169 year: 2007 ident: ref2 article-title: Natural actor-critic for road traffic optimisation publication-title: Advances in Neural Information Processing Systems 19 – volume: 15 start-page: 319 year: 2001 ident: ref4 article-title: Infinite-horizon policy-gradient estimation publication-title: J Artif Intell Res doi: 10.1613/jair.806 – ident: ref32 doi: 10.1016/0893-6080(90)90056-Q – ident: ref66 doi: 10.1109/TSMCB.2005.846001 – ident: ref3 doi: 10.1023/A:1017936530646 – ident: ref5 doi: 10.1109/TFUZZ.2003.814834 – ident: ref43 doi: 10.1109/ADPRL.2007.368196 – ident: ref12 doi: 10.1016/j.automatica.2010.02.018 – start-page: 1531 year: 2001 ident: ref51 article-title: Natural policy gradient publication-title: Advances in Neural Information Processing Systems 14 – ident: ref25 doi: 10.1016/B978-1-55860-377-6.50013-X – ident: ref84 doi: 10.1007/s10994-010-5223-6
SSID	ssj0014493
Score	2.5693433
SecondaryResourceType	review_article
Snippet	Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to... Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to...
SourceID	hal proquest pascalfrancis crossref ieee
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source Publisher
StartPage	1291
SubjectTerms	Actor-critic Algorithmics. Computability. Computer arithmetics Algorithms Applied sciences Approximation algorithms Approximation methods Artificial intelligence Automatic Automatic Control Engineering Computer Science Computer science; control theory; systems Control theory. Systems Convergence Engineering Sciences Equations Estimates Exact sciences and technology Learning Machine Learning natural gradient Optimization Policies policy gradient Power control Reinforcement reinforcement learning (RL) Robotics Searching Theoretical computing
Title	A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients
URI	https://ieeexplore.ieee.org/document/6392457 https://www.proquest.com/docview/1315674222 https://hal.science/hal-00756747
Volume	42
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB51e4ID0BZEgFYGcQCBt47jvLitVpQVYntgW9RbZDs2SK120W5SCX59x4-NykOIU6LEiZzM2P5mPPMNwEulKomTHqeMS0lFplsq60xQZYwWOjWF9VUi5qfF7Fx8vMgvduDtkAtjjPHBZ2bsTv1efrvSvXOVHeNqykVejmCEhlvI1Rp2DISoQzB9LWiBoGabIMPq47PFfDp1UVx8zHFFy10tiVuL0OibC4H0tVVcZKTc4M-xoarFHxO0X3VO7sN8298QbHI57js11j9_o3L83w96APci_CSToC97sGOW-3D3FinhPuzF4b4hryIn9esD-DIhi359bX6QlSUT5-inoUgC-Ww896r2bkYS6Vq_viOL6KMgeCCn0tN7kMBCTD6sfaBZt3kI5yfvz6YzGksyUJ0z3lGpLNor0hattm2qVWtbJSpmRV7jRJAVJlNMGERNquSikkXZFhWTznOFUEMqmT2C3eVqaR4DKZk1aI8yU4lcKMmdIZipNJWlrpWSJoF0K6NGR75yVzbjqvF2C6sbL9fGybWJck3gzfDM98DW8c_WL1D0Q0NHtD2bfGrcNYekCrS0rtMEDpzchlZRZAkc_aIpw31eIARCIzaB51vVaXDEum0YuTSrftOkWerejcDsyd_f_RTuuG6GjMdnsNute3OI0KdTR17nbwDM5v4T
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwED9t4wF4ALaBCB_DIB5AkC5xHCfhraoYBdo-0A7tLbKd8yaBWtQmk-Cv5-yk0fgQ4ilR4kROzvb97nz3O4DnWueKFj0eRlypUCSmClWRiFAjGmFilNZXiZjO5PhUfDhLz3bgdZ8Lg4g--AwH7tTv5Vcr0zhX2TFpUy7SbBeukd5PeZut1e8ZCFG04fSFCCXBmm2KTFQcL-bT0cjFcfEBJ52WumoSV9TQ7oULgvTVVVxspNrQ77FtXYs_lmivd05uw3Tb4zbc5MugqfXA_PiNzPF_P-kO3OoAKBu2I2YfdnB5ADev0BIewH434TfsRcdK_fIQPg_ZvFlf4ne2smzoXP1hWyaBfULPvmq8o5F1hK3nb9i881IwOrCZ8gQfrOUhZu_WPtSs3tyF05O3i9E47IoyhCaNeB0qbcliUVZWxlax0ZWttMgjK9KCloJEYqIjgYSbdMZFrmRWyTxSzndFYENpldyDveVqifeBZZFFskgjzEUqtOLOFEx0HKvMFForDCDeyqg0HWO5K5zxtfSWS1SUXq6lk2vZyTWAV_0z31q-jn-2fkai7xs6qu3xcFK6aw5LSbK1LuMADp3c-ladyAI4-mWk9Pe5JBBEZmwAT7dDp6Q56zZi1BJXzaaMk9i9m6DZg7-_-wlcHy-mk3LyfvbxIdxwXW7zHx_BXr1u8DEBoVof-fH_EzTfAWw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Survey+of+Actor-Critic+Reinforcement+Learning%3A+Standard+and+Natural+Policy+Gradients&rft.jtitle=IEEE+transactions+on+systems%2C+man+and+cybernetics.+Part+C%2C+Applications+and+reviews&rft.au=GRONDMAN%2C+Ivo&rft.au=BUSONIU%2C+Lucian&rft.au=LOPES%2C+Gabriel+A.+D&rft.au=BABUSKA%2C+Robert&rft.date=2012-11-01&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1094-6977&rft.eissn=1558-2442&rft.volume=42&rft.issue=6&rft.spage=1291&rft.epage=1307&rft_id=info:doi/10.1109%2FTSMCC.2012.2218595&rft.externalDBID=n%2Fa&rft.externalDocID=26818899
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1094-6977&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1094-6977&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1094-6977&client=summon