Toward Human-in-the-Loop AI: Enhancing Deep Reinforcement Learning via Real-Time Human Guidance for Autonomous Driving

Due to its limited intelligence and abilities, machine learning is currently unable to handle various situations thus cannot completely replace humans in real-world applications. Because humans exhibit robustness and adaptability in complex scenarios, it is crucial to introduce humans into the train...

Full description

Saved in:

Bibliographic Details
Published in	Engineering (Beijing, China) Vol. 21; pp. 75 - 91
Main Authors	Wu, Jingda, Huang, Zhiyu, Hu, Zhongxu, Lv, Chen
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.02.2023 Elsevier
Subjects	Autonomous driving Deep reinforcement learning Human guidance Human-in-the-loop AI Human-in-the-loop AI Deep reinforcement learning Human guidance Autonomous driving
Online Access	Get full text

Cover

Loading…

Abstract	Due to its limited intelligence and abilities, machine learning is currently unable to handle various situations thus cannot completely replace humans in real-world applications. Because humans exhibit robustness and adaptability in complex scenarios, it is crucial to introduce humans into the training loop of artificial intelligence (AI), leveraging human intelligence to further advance machine learning algorithms. In this study, a real-time human-guidance-based (Hug)-deep reinforcement learning (DRL) method is developed for policy training in an end-to-end autonomous driving case. With our newly designed mechanism for control transfer between humans and automation, humans are able to intervene and correct the agent’s unreasonable actions in real time when necessary during the model training process. Based on this human-in-the-loop guidance mechanism, an improved actor-critic architecture with modified policy and value networks is developed. The fast convergence of the proposed Hug-DRL allows real-time human guidance actions to be fused into the agent’s training loop, further improving the efficiency and performance of DRL. The developed method is validated by human-in-the-loop experiments with 40 subjects and compared with other state-of-the-art learning approaches. The results suggest that the proposed method can effectively enhance the training efficiency and performance of the DRL algorithm under human guidance without imposing specific requirements on participants’ expertise or experience.
AbstractList	Due to its limited intelligence and abilities, machine learning is currently unable to handle various situations thus cannot completely replace humans in real-world applications. Because humans exhibit robustness and adaptability in complex scenarios, it is crucial to introduce humans into the training loop of artificial intelligence (AI), leveraging human intelligence to further advance machine learning algorithms. In this study, a real-time human-guidance-based (Hug)-deep reinforcement learning (DRL) method is developed for policy training in an end-to-end autonomous driving case. With our newly designed mechanism for control transfer between humans and automation, humans are able to intervene and correct the agent’s unreasonable actions in real time when necessary during the model training process. Based on this human-in-the-loop guidance mechanism, an improved actor-critic architecture with modified policy and value networks is developed. The fast convergence of the proposed Hug-DRL allows real-time human guidance actions to be fused into the agent’s training loop, further improving the efficiency and performance of DRL. The developed method is validated by human-in-the-loop experiments with 40 subjects and compared with other state-of-the-art learning approaches. The results suggest that the proposed method can effectively enhance the training efficiency and performance of the DRL algorithm under human guidance without imposing specific requirements on participants’ expertise or experience.
Author	Hu, Zhongxu Lv, Chen Huang, Zhiyu Wu, Jingda
Author_xml	– sequence: 1 givenname: Jingda surname: Wu fullname: Wu, Jingda – sequence: 2 givenname: Zhiyu surname: Huang fullname: Huang, Zhiyu – sequence: 3 givenname: Zhongxu surname: Hu fullname: Hu, Zhongxu – sequence: 4 givenname: Chen surname: Lv fullname: Lv, Chen email: lyuchen@ntu.edu.sg
BookMark	eNp9kM9OGzEQh32gEpTyANz8Arsde9f7pz1FQCFSpEoonK1ZezY4ytqR10nVt69DKg4cOI004-834-8ru_DBE2O3AkoBovm-LclvSglSlqBKEO0Fu5LQq6KDvr9kN_O8BQChBLTQXbHjOvzBaPnTYUJfOF-kVypWIez5YvmDP_hX9Mb5Db8n2vNncn4M0dBEPvEVYfSn2dFhHuGuWLuJzkn88eBsRonn93xxSMGHKRxmfh_dMTPf2JcRdzPd_K_X7OXXw_ruqVj9flzeLVaFqaFJhVTKUG2hs71pVC07JQkN2sooqHORMLaq6tQgzNB1lTWjzJ8bbF1XigixumbLc64NuNX76CaMf3VAp98aIW40xuTMjjSpSlVigMaiqftGDa0c82YjOtUA0JizxDnLxDDPkcb3PAH65F5vdXavT-41KJ3dZ6b9wBiXMLngU0S3-5T8eSYp6zk6ino2jrJR6yKZlO93n9D_AJFxooc
CitedBy_id	crossref_primary_10_1109_TITS_2024_3420959 crossref_primary_10_1109_JIOT_2024_3479285 crossref_primary_10_3390_s25010211 crossref_primary_10_1016_j_enconman_2024_118499 crossref_primary_10_1109_TCYB_2024_3401014 crossref_primary_10_3390_s22197415 crossref_primary_10_1016_j_asoc_2024_112386 crossref_primary_10_1109_TITS_2023_3254579 crossref_primary_10_1016_j_measen_2024_101241 crossref_primary_10_1016_j_aap_2023_107372 crossref_primary_10_3390_ai5020041 crossref_primary_10_1016_j_apenergy_2024_125138 crossref_primary_10_3934_era_2024111 crossref_primary_10_1016_j_energy_2023_128928 crossref_primary_10_1016_j_eng_2023_10_005 crossref_primary_10_1109_TITS_2024_3375331 crossref_primary_10_1109_TIV_2023_3336768 crossref_primary_10_1016_j_energy_2023_130146 crossref_primary_10_1016_j_eswa_2024_126319 crossref_primary_10_3390_s25010026 crossref_primary_10_1016_j_matt_2024_01_005 crossref_primary_10_3390_app14010463 crossref_primary_10_1016_j_aei_2025_103188 crossref_primary_10_1080_10447318_2024_2413293 crossref_primary_10_1109_TITS_2024_3452480 crossref_primary_10_1007_s41870_023_01412_6 crossref_primary_10_1109_ACCESS_2023_3339631 crossref_primary_10_1109_TVT_2024_3355895 crossref_primary_10_1016_j_eng_2023_12_003 crossref_primary_10_1109_TITS_2023_3339125 crossref_primary_10_1109_TTE_2023_3339490 crossref_primary_10_1142_S2424905X24400105 crossref_primary_10_1080_13658816_2023_2279975 crossref_primary_10_3390_technologies12120259 crossref_primary_10_1109_TVT_2023_3307409 crossref_primary_10_1109_TITS_2024_3420894 crossref_primary_10_1016_j_ssci_2024_106770 crossref_primary_10_1109_TMC_2024_3501734 crossref_primary_10_1007_s10489_024_06054_0 crossref_primary_10_1016_j_eng_2024_10_021 crossref_primary_10_1016_j_apenergy_2024_123217 crossref_primary_10_1016_j_oceaneng_2025_120446 crossref_primary_10_1007_s10489_024_06098_2 crossref_primary_10_1109_ACCESS_2024_3401547 crossref_primary_10_1109_TITS_2022_3208004 crossref_primary_10_1080_08839514_2024_2349410 crossref_primary_10_1109_TASE_2023_3342419 crossref_primary_10_1109_JAS_2023_123477 crossref_primary_10_1016_j_rser_2023_114154 crossref_primary_10_1142_S2424905X23400044 crossref_primary_10_1016_j_automatica_2024_111558 crossref_primary_10_1109_TTE_2024_3421342 crossref_primary_10_1109_TIV_2022_3195635 crossref_primary_10_1109_TTE_2023_3290069 crossref_primary_10_1155_2023_1286977 crossref_primary_10_1016_j_energy_2023_128462 crossref_primary_10_1109_TPAMI_2023_3314762 crossref_primary_10_1109_JIOT_2024_3497185 crossref_primary_10_1016_j_geits_2023_100122 crossref_primary_10_1016_j_trc_2024_104654 crossref_primary_10_1109_TITS_2023_3316203 crossref_primary_10_1016_j_commtr_2024_100127 crossref_primary_10_1080_15472450_2024_2370010 crossref_primary_10_34133_research_0349
Cites_doi	10.1109/TCDS.2016.2628365 10.1038/s41467-021-21007-8 10.1038/nature14236 10.1038/s42256-019-0046-z 10.2352/ISSN.2470-1173.2017.19.AVM-023 10.1109/JSEN.2020.3003121 10.1038/nature14540 10.1038/s42256-019-0025-4 10.1109/JAS.2017.7510745 10.1109/LRA.2020.2967299 10.1038/nature16961 10.1515/eng-2020-0004 10.1038/nature24270 10.1126/science.aar6404
ContentType	Journal Article
Copyright	2022 THE AUTHOR
Copyright_xml	– notice: 2022 THE AUTHOR
DBID	6I. AAFTH AAYXX CITATION DOA
DOI	10.1016/j.eng.2022.05.017
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EndPage	91
ExternalDocumentID	oai_doaj_org_article_e53531b06dac4965b72fd08c185600ef 10_1016_j_eng_2022_05_017 S2095809922004878
GroupedDBID	0R~ 0SF 1-T 5VR 6I. 92H 92I 92R 93N AACTN AAEDW AAFTH AALRI AAXUO ABMAC ACGFS ACHIH ADBBV AEXQZ AFTJW AFUIB AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ BCNDV CCEZO CEKLB EBS EJD FDB GROUPED_DOAJ IPNFZ M41 NCXOZ O9- OK1 RIG ROL SSZ TCJ TGT -SC -S~ AAYWO AAYXX ACVFH ADCNI ADVLN AEUPX AFJKZ AFPUW AIGII AKBMS AKRWK AKYEP CAJEC CITATION Q-- U1G U5M
ID	FETCH-LOGICAL-c406t-255ce4d08d9c6542852eacad3c504ad320f75385b1cb883dcf2000bd4435eeaa3
IEDL.DBID	DOA
ISSN	2095-8099
IngestDate	Wed Aug 27 01:16:33 EDT 2025 Thu Apr 24 22:54:05 EDT 2025 Tue Jul 01 02:18:54 EDT 2025 Thu Jul 20 20:08:50 EDT 2023
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Human-in-the-loop AI Deep reinforcement learning Human guidance Autonomous driving
Language	English
License	This is an open access article under the CC BY-NC-ND license.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c406t-255ce4d08d9c6542852eacad3c504ad320f75385b1cb883dcf2000bd4435eeaa3
OpenAccessLink	https://doaj.org/article/e53531b06dac4965b72fd08c185600ef
PageCount	17
ParticipantIDs	doaj_primary_oai_doaj_org_article_e53531b06dac4965b72fd08c185600ef crossref_primary_10_1016_j_eng_2022_05_017 crossref_citationtrail_10_1016_j_eng_2022_05_017 elsevier_sciencedirect_doi_10_1016_j_eng_2022_05_017
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	February 2023 2023-02-00 2023-02-01
PublicationDateYYYYMMDD	2023-02-01
PublicationDate_xml	– month: 02 year: 2023 text: February 2023
PublicationDecade	2020
PublicationTitle	Engineering (Beijing, China)
PublicationYear	2023
Publisher	Elsevier Ltd Elsevier
Publisher_xml	– name: Elsevier Ltd – name: Elsevier
References	Codevilla, Müller, López, Koltun, Dosovitskiy (b0025) 2018 Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. 2017. arXiv:1707.08817. Neftci, Averbeck (b0100) 2019; 1 Huang, Lv, Xing, Wu (b0110) 2021; 21 Nair, McGrew, Andrychowicz, Zaremba, Abbeel (b0170) 2018 Lv, Cao, Zhao, Auger, Sullman, Wang (b0115) 2018; 5 Ziebart, Maas, Bagnell, Dey (b0150) 2008 Machado, Bellemare, Bowling (b0195) 2020 Feng, Yan, Sun, Feng, Liu (b0020) 2021; 12 Mo X, Huang Z, Xing Y, Lv C. Multi-agent trajectory prediction with heterogeneous edge-enhanced graph attention network. IEEE Trans Intell Transp Syst. In press. Huang Z, Wu J, Lv C. Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Trans Neural Netw Learn Syst. In press. Silver, Schrittwieser, Simonyan, Antonoglou, Huang, Guez (b0055) 2017; 550 Saunders, Sastry, Stuhlmüller, Evans (b0160) 2018 Haarnoja, Zhou, Abbeel, Levine (b0085) 2018 Rajeswaran, Kumar, Gupta, Vezzani, Schulman, Todorov (b0140) 2018 Codevilla, Santana, López, Gaidon (b0035) 2019 Hu Z, Zhang Y, Xing Y, Zhao Y, Cao D, Lv C. Toward human-centered automated driving: a novel spatiotemporal vision transformer-enabled head tracker. IEEE Veh Technol Mag. In press. Cai, Mei, Tai, Sun, Liu (b0095) 2020; 5 Silver, Huang, Maddison, Guez, Sifre, van den Driessche (b0050) 2016; 529 Droździel, Tarkowski, Rybicka, Wrona (b0185) 2020; 10 Ibarz, Leike, Pohlen, Irving, Legg, Amodei (b0145) 2018 Stilgoe (b0005) 2019; 1 Wolf, Hubschneider, Weber, Bauer, Härtl, Dürr (b0075) 2017 MacGlashan, Ho, Loftin, Peng, Wang, Roberts (b0130) 2017 Sallab, Abdou, Perot, Yogamani (b0080) 2017; 29 Badia, Sprechmann, Vitvitskyi, Guo, Piot, Kapturowski (b0200) 2020 Ross, Gordon, Bagnell (b0040) 2011 Sutton, Barto (b0065) 2018 Mao, Gan, Kohli, Tenenbaum, Wu (b0120) 2019 Fujimoto, van Hoof, Meger (b0090) 2018 Knox, Stone (b0125) 2012 Harutyunyan, Dabney, Mesnard, Azar, Piot, Heess (b0105) 2019 Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare (b0070) 2015; 518 Huang Z, Wu J, Lv C. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning. IEEE Trans Intell Transp Syst. In press. Littman (b0180) 2015; 521 Silver, Hubert, Schrittwieser, Antonoglou, Lai, Guez (b0060) 2018; 362 Hester, Vecerik, Pietquin, Lanctot, Schaul, Piot (b0155) 2018 Krening, Harrison, Feigh, Isbell, Riedl, Thomaz (b0165) 2017; 9 Ho, Ermon (b0045) 2016 Wang, Zhou, Chen, Fan, Zhang, Li (b0175) 2018 Ross (10.1016/j.eng.2022.05.017_b0040) 2011 Saunders (10.1016/j.eng.2022.05.017_b0160) 2018 Silver (10.1016/j.eng.2022.05.017_b0055) 2017; 550 Huang (10.1016/j.eng.2022.05.017_b0110) 2021; 21 Droździel (10.1016/j.eng.2022.05.017_b0185) 2020; 10 Wolf (10.1016/j.eng.2022.05.017_b0075) 2017 Knox (10.1016/j.eng.2022.05.017_b0125) 2012 Feng (10.1016/j.eng.2022.05.017_b0020) 2021; 12 Fujimoto (10.1016/j.eng.2022.05.017_b0090) 2018 Mao (10.1016/j.eng.2022.05.017_b0120) 2019 Ho (10.1016/j.eng.2022.05.017_b0045) 2016 MacGlashan (10.1016/j.eng.2022.05.017_b0130) 2017 Wang (10.1016/j.eng.2022.05.017_b0175) 2018 Haarnoja (10.1016/j.eng.2022.05.017_b0085) 2018 Lv (10.1016/j.eng.2022.05.017_b0115) 2018; 5 Sutton (10.1016/j.eng.2022.05.017_b0065) 2018 Hester (10.1016/j.eng.2022.05.017_b0155) 2018 Krening (10.1016/j.eng.2022.05.017_b0165) 2017; 9 Sallab (10.1016/j.eng.2022.05.017_b0080) 2017; 29 Codevilla (10.1016/j.eng.2022.05.017_b0025) 2018 Codevilla (10.1016/j.eng.2022.05.017_b0035) 2019 Ziebart (10.1016/j.eng.2022.05.017_b0150) 2008 Harutyunyan (10.1016/j.eng.2022.05.017_b0105) 2019 Silver (10.1016/j.eng.2022.05.017_b0060) 2018; 362 Littman (10.1016/j.eng.2022.05.017_b0180) 2015; 521 Silver (10.1016/j.eng.2022.05.017_b0050) 2016; 529 Cai (10.1016/j.eng.2022.05.017_b0095) 2020; 5 Machado (10.1016/j.eng.2022.05.017_b0195) 2020 10.1016/j.eng.2022.05.017_b0015 10.1016/j.eng.2022.05.017_b0135 Stilgoe (10.1016/j.eng.2022.05.017_b0005) 2019; 1 10.1016/j.eng.2022.05.017_b0010 10.1016/j.eng.2022.05.017_b0030 Nair (10.1016/j.eng.2022.05.017_b0170) 2018 Badia (10.1016/j.eng.2022.05.017_b0200) 2020 Mnih (10.1016/j.eng.2022.05.017_b0070) 2015; 518 Rajeswaran (10.1016/j.eng.2022.05.017_b0140) 2018 10.1016/j.eng.2022.05.017_b0190 Neftci (10.1016/j.eng.2022.05.017_b0100) 2019; 1 Ibarz (10.1016/j.eng.2022.05.017_b0145) 2018
References_xml	– volume: 518 start-page: 529 year: 2015 end-page: 533 ident: b0070 article-title: Human-level control through deep reinforcement learning publication-title: Nature – volume: 29 start-page: 70 year: 2017 end-page: 76 ident: b0080 article-title: Deep reinforcement learning framework for autonomous driving publication-title: Electron Imaging – reference: Hu Z, Zhang Y, Xing Y, Zhao Y, Cao D, Lv C. Toward human-centered automated driving: a novel spatiotemporal vision transformer-enabled head tracker. IEEE Veh Technol Mag. In press. – reference: Huang Z, Wu J, Lv C. Efficient deep reinforcement learning with imitative expert priors for autonomous driving. IEEE Trans Neural Netw Learn Syst. In press. – start-page: 9329 year: 2019 end-page: 9338 ident: b0035 article-title: Exploring the limitations of behavior cloning for autonomous driving publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27–Nov 2; Seoul, Republic of Korea – reference: Mo X, Huang Z, Xing Y, Lv C. Multi-agent trajectory prediction with heterogeneous edge-enhanced graph attention network. IEEE Trans Intell Transp Syst. In press. – volume: 12 start-page: 748 year: 2021 ident: b0020 article-title: Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment publication-title: Nat Commun – start-page: 12498 year: 2019 end-page: 12507 ident: b0105 article-title: Hindsight credit assignment publication-title: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019); 2019 Dec 9–14; Vancouver, BC, Canada – start-page: 1433 year: 2008 end-page: 1438 ident: b0150 article-title: Maximum entropy inverse reinforcement learning publication-title: Proceedings of the 23rd AAAI Conference on Artificial Intelligence; 2008 Jul 13–17; Chicago, IL, USA – start-page: 627 year: 2011 end-page: 635 ident: b0040 article-title: A reduction of imitation learning and structured prediction to no-regret online learning publication-title: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS); 2011 Apr 11–13; Fort Lauderdale, FL, USA – start-page: 244 year: 2017 end-page: 250 ident: b0075 article-title: Learning how to drive in a real world simulation with deep Q-Networks publication-title: Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV); 2017 Jun 11–14; Los Angeles, CA, USA – start-page: 6292 year: 2018 end-page: 6299 ident: b0170 article-title: Overcoming exploration in reinforcement learning with demonstrations publication-title: Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018 May 21–25; Brisbane, QLD, Australia – volume: 1 start-page: 202 year: 2019 end-page: 203 ident: b0005 article-title: Self-driving cars will take a while to get right publication-title: Nat Mach Intell – volume: 550 start-page: 354 year: 2017 end-page: 359 ident: b0055 article-title: Mastering the game of Go without human knowledge publication-title: Nature – start-page: 410 year: 2018 end-page: 421 ident: b0175 article-title: Intervention aided reinforcement learning for safe and practical policy optimization in navigation publication-title: Proceedings of the 2nd Conference on Robot Learning; 2018 Oct 29–31; Zürich, Switzerland – start-page: 1587 year: 2018 end-page: 1596 ident: b0090 article-title: Addressing function approximation error in actor-critic methods publication-title: Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10–15; Stockholm, Sweden – volume: 21 start-page: 11781 year: 2021 end-page: 11790 ident: b0110 article-title: Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding publication-title: IEEE Sens J – reference: Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. 2017. arXiv:1707.08817. – start-page: 2067 year: 2018 end-page: 2069 ident: b0160 article-title: Trial without error: towards safe reinforcement learning via human intervention publication-title: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems; 2018 Jul 10–15; Stockholm, Sweden – volume: 5 start-page: 1247 year: 2020 end-page: 1254 ident: b0095 article-title: High-speed autonomous drifting with deep reinforcement learning publication-title: IEEE Robot Autom Lett – volume: 5 start-page: 58 year: 2018 end-page: 68 ident: b0115 article-title: Analysis of autopilot disengagements occurring during autonomous vehicle testing publication-title: IEEE/CAA J Autom Sin – volume: 9 start-page: 44 year: 2017 end-page: 55 ident: b0165 article-title: Learning from explanations using sentiment and advice in RL publication-title: IEEE Trans Cogn Dev Syst – start-page: 1 year: 2016 end-page: 9 ident: b0045 article-title: Generative adversarial imitation learning publication-title: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016); 2016 Dec 5–10; Barcelona, Spain – start-page: 5125 year: 2020 end-page: 5133 ident: b0195 article-title: Count-based exploration with the successor representation publication-title: Proceedings of the 34th AAAI Conference on Artificial Intelligence; 2020 Feb 7–12; New York City, NY, USA – year: 2018 ident: b0065 article-title: Reinforcement learning: an introduction – volume: 1 start-page: 133 year: 2019 end-page: 143 ident: b0100 article-title: Reinforcement learning in artificial and biological systems publication-title: Nat Mach Intell – start-page: 2285 year: 2017 end-page: 2294 ident: b0130 article-title: Interactive learning from policy-dependent human feedback publication-title: Proceedings of the 34th International Conference on Machine Learning; 2017 Aug 6–11; Sydney, NSW, Australia – start-page: 3223 year: 2018 end-page: 3230 ident: b0155 article-title: Deep Q-learning from demonstrations publication-title: Proceedings of the 32nd AAAI Conference on Artificial Intelligence; 2018 Feb 2–7; New Orleans, LA, USA – start-page: 1 year: 2018 end-page: 9 ident: b0140 article-title: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations publication-title: Proceedings of Robotics: Science and Systems; 2018 Jun 26–30; Pittsburgh, PA, USA – start-page: 8011 year: 2018 end-page: 8023 ident: b0145 article-title: Reward learning from human preferences and demonstrations in Atari publication-title: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS); 2018 Dec 3–8; Montreal, QC, Canada – start-page: 1 year: 2020 end-page: 26 ident: b0200 article-title: Never give up: learning directed exploration strategies publication-title: Proceedings of the 8th International Conference on Learning Representations (ICLR 2020); 2020 Apr 26–May 1; Addis Ababa, Ethiopia – start-page: 1861 year: 2018 end-page: 1870 ident: b0085 article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor publication-title: Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10–15; Stockholm, Sweden – volume: 529 start-page: 484 year: 2016 end-page: 489 ident: b0050 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature – start-page: 1 year: 2019 end-page: 28 ident: b0120 article-title: The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision publication-title: Proceedings of the 7th International Conference on Learning Representations (ICLR); 2019 May 6–9; New Orleans, LA, USA – start-page: 878 year: 2012 end-page: 885 ident: b0125 article-title: Reinforcement learning from human reward: discounting in episodic tasks publication-title: Proceedings of 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication; 2012 Sep 9–13; Paris, France – start-page: 4693 year: 2018 end-page: 4700 ident: b0025 article-title: End-to-end driving via conditional imitation learning publication-title: Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018 May 21–25; Brisbane, QLD, Australia – volume: 521 start-page: 445 year: 2015 end-page: 451 ident: b0180 article-title: Reinforcement learning improves behaviour from evaluative feedback publication-title: Nature – reference: Huang Z, Wu J, Lv C. Driving behavior modeling using naturalistic human driving data with inverse reinforcement learning. IEEE Trans Intell Transp Syst. In press. – volume: 362 start-page: 1140 year: 2018 end-page: 1144 ident: b0060 article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play publication-title: Science – volume: 10 start-page: 35 year: 2020 end-page: 47 ident: b0185 article-title: Drivers’ reaction time research in the conditions in the real traffic publication-title: Open Eng – volume: 9 start-page: 44 issue: 1 year: 2017 ident: 10.1016/j.eng.2022.05.017_b0165 article-title: Learning from explanations using sentiment and advice in RL publication-title: IEEE Trans Cogn Dev Syst doi: 10.1109/TCDS.2016.2628365 – ident: 10.1016/j.eng.2022.05.017_b0015 – volume: 12 start-page: 748 year: 2021 ident: 10.1016/j.eng.2022.05.017_b0020 article-title: Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment publication-title: Nat Commun doi: 10.1038/s41467-021-21007-8 – start-page: 9329 year: 2019 ident: 10.1016/j.eng.2022.05.017_b0035 article-title: Exploring the limitations of behavior cloning for autonomous driving – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 10.1016/j.eng.2022.05.017_b0070 article-title: Human-level control through deep reinforcement learning publication-title: Nature doi: 10.1038/nature14236 – volume: 1 start-page: 202 issue: 5 year: 2019 ident: 10.1016/j.eng.2022.05.017_b0005 article-title: Self-driving cars will take a while to get right publication-title: Nat Mach Intell doi: 10.1038/s42256-019-0046-z – volume: 29 start-page: 70 year: 2017 ident: 10.1016/j.eng.2022.05.017_b0080 article-title: Deep reinforcement learning framework for autonomous driving publication-title: Electron Imaging doi: 10.2352/ISSN.2470-1173.2017.19.AVM-023 – start-page: 244 year: 2017 ident: 10.1016/j.eng.2022.05.017_b0075 article-title: Learning how to drive in a real world simulation with deep Q-Networks – volume: 21 start-page: 11781 issue: 10 year: 2021 ident: 10.1016/j.eng.2022.05.017_b0110 article-title: Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding publication-title: IEEE Sens J doi: 10.1109/JSEN.2020.3003121 – start-page: 12498 year: 2019 ident: 10.1016/j.eng.2022.05.017_b0105 article-title: Hindsight credit assignment – start-page: 1 year: 2019 ident: 10.1016/j.eng.2022.05.017_b0120 article-title: The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision – start-page: 5125 year: 2020 ident: 10.1016/j.eng.2022.05.017_b0195 article-title: Count-based exploration with the successor representation – start-page: 1433 year: 2008 ident: 10.1016/j.eng.2022.05.017_b0150 article-title: Maximum entropy inverse reinforcement learning – ident: 10.1016/j.eng.2022.05.017_b0010 – start-page: 1861 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0085 article-title: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor – start-page: 410 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0175 article-title: Intervention aided reinforcement learning for safe and practical policy optimization in navigation – start-page: 1587 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0090 article-title: Addressing function approximation error in actor-critic methods – volume: 521 start-page: 445 issue: 7553 year: 2015 ident: 10.1016/j.eng.2022.05.017_b0180 article-title: Reinforcement learning improves behaviour from evaluative feedback publication-title: Nature doi: 10.1038/nature14540 – start-page: 4693 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0025 article-title: End-to-end driving via conditional imitation learning – start-page: 2067 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0160 article-title: Trial without error: towards safe reinforcement learning via human intervention – volume: 1 start-page: 133 issue: 3 year: 2019 ident: 10.1016/j.eng.2022.05.017_b0100 article-title: Reinforcement learning in artificial and biological systems publication-title: Nat Mach Intell doi: 10.1038/s42256-019-0025-4 – start-page: 878 year: 2012 ident: 10.1016/j.eng.2022.05.017_b0125 article-title: Reinforcement learning from human reward: discounting in episodic tasks – volume: 5 start-page: 58 issue: 1 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0115 article-title: Analysis of autopilot disengagements occurring during autonomous vehicle testing publication-title: IEEE/CAA J Autom Sin doi: 10.1109/JAS.2017.7510745 – start-page: 1 year: 2020 ident: 10.1016/j.eng.2022.05.017_b0200 article-title: Never give up: learning directed exploration strategies – ident: 10.1016/j.eng.2022.05.017_b0190 – volume: 5 start-page: 1247 issue: 2 year: 2020 ident: 10.1016/j.eng.2022.05.017_b0095 article-title: High-speed autonomous drifting with deep reinforcement learning publication-title: IEEE Robot Autom Lett doi: 10.1109/LRA.2020.2967299 – start-page: 2285 year: 2017 ident: 10.1016/j.eng.2022.05.017_b0130 article-title: Interactive learning from policy-dependent human feedback – start-page: 627 year: 2011 ident: 10.1016/j.eng.2022.05.017_b0040 article-title: A reduction of imitation learning and structured prediction to no-regret online learning – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 10.1016/j.eng.2022.05.017_b0050 article-title: Mastering the game of Go with deep neural networks and tree search publication-title: Nature doi: 10.1038/nature16961 – start-page: 3223 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0155 article-title: Deep Q-learning from demonstrations – ident: 10.1016/j.eng.2022.05.017_b0135 – volume: 10 start-page: 35 issue: 1 year: 2020 ident: 10.1016/j.eng.2022.05.017_b0185 article-title: Drivers’ reaction time research in the conditions in the real traffic publication-title: Open Eng doi: 10.1515/eng-2020-0004 – ident: 10.1016/j.eng.2022.05.017_b0030 – start-page: 1 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0140 article-title: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations – start-page: 1 year: 2016 ident: 10.1016/j.eng.2022.05.017_b0045 article-title: Generative adversarial imitation learning – year: 2018 ident: 10.1016/j.eng.2022.05.017_b0065 – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 10.1016/j.eng.2022.05.017_b0055 article-title: Mastering the game of Go without human knowledge publication-title: Nature doi: 10.1038/nature24270 – start-page: 6292 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0170 article-title: Overcoming exploration in reinforcement learning with demonstrations – volume: 362 start-page: 1140 issue: 6419 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0060 article-title: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play publication-title: Science doi: 10.1126/science.aar6404 – start-page: 8011 year: 2018 ident: 10.1016/j.eng.2022.05.017_b0145 article-title: Reward learning from human preferences and demonstrations in Atari
SSID	ssj0001510708
Score	2.5119383
Snippet	Due to its limited intelligence and abilities, machine learning is currently unable to handle various situations thus cannot completely replace humans in...
SourceID	doaj crossref elsevier
SourceType	Open Website Enrichment Source Index Database Publisher
StartPage	75
SubjectTerms	Autonomous driving Deep reinforcement learning Human guidance Human-in-the-loop AI
Title	Toward Human-in-the-Loop AI: Enhancing Deep Reinforcement Learning via Real-Time Human Guidance for Autonomous Driving
URI	https://dx.doi.org/10.1016/j.eng.2022.05.017 https://doaj.org/article/e53531b06dac4965b72fd08c185600ef
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8MwDI4QJzggnmK8lAMnpIg-15TbYOMlxAGBxK3KwxlFUzeNbb8fu-lQOQAXTpXaNK0cN_7s2p8ZOwW0YRqkEy5InEjCzAlEIaHIIu2cVI7CbpRt8di9fUnuX9PXVqsvygnz9MBecOeQxqgmOuhaZYjbXGeRs4E0aGfQVoOj3RdtXsuZ8vXB6Nb4dnSIIXAbzvPlL806uQuqIfqGUeRZO7NvRqnm7m_Zppa9ud5kGw1Q5D3_gltsBapttt6iD9xhi-c655XXgXhRVgLBnHgYjye8d3fBB9UbcWlUQ94HmPAnqDlSTR0O5A2t6pAvSoWX1EhQLYifid_MS0vKwHE8781nVPcwnn_w_rSk8MMue7kePF_diqaPgjBormcCvQYDCQrN5ob6U8k0wu1W2dikQYKHKHDotMhUh0ZLGVvjqH5H2wShFIBS8R5brcYV7DMuMwh1GEuDSCYJbKS0zEFp24UoS3MDHRYsBVmYhmScel2MimU22Tt-jsOCZF8EaYGy77Czr1smnmHjt8GXtDpfA4kcuz6BKlM0KlP8pTIdlizXtmhwhscPOFX587MP_uPZh2yNGtb7vO8jtjqbzuEYYc1Mn9Qa_AkR4vKx
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Human-in-the-Loop+AI%3A+Enhancing+Deep+Reinforcement+Learning+via+Real-Time+Human+Guidance+for+Autonomous+Driving&rft.jtitle=Engineering+%28Beijing%2C+China%29&rft.au=Jingda+Wu&rft.au=Zhiyu+Huang&rft.au=Zhongxu+Hu&rft.au=Chen+Lv&rft.date=2023-02-01&rft.pub=Elsevier&rft.issn=2095-8099&rft.volume=21&rft.spage=75&rft.epage=91&rft_id=info:doi/10.1016%2Fj.eng.2022.05.017&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_e53531b06dac4965b72fd08c185600ef
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2095-8099&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2095-8099&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2095-8099&client=summon