SAMBA: safe model-based & active reinforcement learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process ev...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 111; no. 1; pp. 173 - 203
Main Authors	Cowen-Rivers, Alexander I., Palenicek, Daniel, Moens, Vincent, Abdullah, Mohammed Amin, Sootla, Aivar, Wang, Jun, Bou-Ammar, Haitham
Format	Journal Article
Language	English
Published	New York Springer US 01.01.2022 Springer Nature B.V
Subjects	Active learning Algorithms Artificial Intelligence Computer Science Control Gaussian process Information theory Learning Machine Learning Mechatronics Multiple objective analysis Natural Language Processing (NLP) Robotics Simulation and Modeling Special Issue: Foundations of Data Science Statistical methods Active learning Safe reinforcement learning Gaussian process
Online Access	Get full text
ISSN	0885-6125 1573-0565
DOI	10.1007/s10994-021-06103-6

Cover

Abstract	In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our acquisition functions and safety constraints.
AbstractList	In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our acquisition functions and safety constraints.
Author	Cowen-Rivers, Alexander I. Sootla, Aivar Bou-Ammar, Haitham Wang, Jun Abdullah, Mohammed Amin Palenicek, Daniel Moens, Vincent
Author_xml	– sequence: 1 givenname: Alexander I. orcidid: 0000-0002-2669-9513 surname: Cowen-Rivers fullname: Cowen-Rivers, Alexander I. email: alexander.cowen.rivers@huawei.com organization: Huawei Noah’s Ark Lab, Technical University Darmstadt – sequence: 2 givenname: Daniel surname: Palenicek fullname: Palenicek, Daniel organization: Huawei Noah’s Ark Lab, Technical University Darmstadt – sequence: 3 givenname: Vincent surname: Moens fullname: Moens, Vincent organization: Huawei Noah’s Ark Lab – sequence: 4 givenname: Mohammed Amin surname: Abdullah fullname: Abdullah, Mohammed Amin organization: Huawei Noah’s Ark Lab – sequence: 5 givenname: Aivar surname: Sootla fullname: Sootla, Aivar organization: Huawei Noah’s Ark Lab – sequence: 6 givenname: Jun surname: Wang fullname: Wang, Jun organization: Huawei Noah’s Ark Lab, University College London – sequence: 7 givenname: Haitham surname: Bou-Ammar fullname: Bou-Ammar, Haitham organization: Huawei Noah’s Ark Lab, University College London
BookMark	eNp9kM1KAzEUhYNUsK2-gKsBwV00P5NMxl0t_oHiQl2HJL0pU6aZmqSCb-Oz-GSOjiC46OpuznfO5ZugUegCIHRMyRklpDpPlNR1iQmjmEhKOJZ7aExFxTERUozQmCglsKRMHKBJSitCCJNKjpF6mj1czi6KZDwU624BLbYmweLz47QwLjdvUERogu-igzWEXLRgYmjC8hDte9MmOPq9U_RyffU8v8X3jzd389k9dlzyjBXl0jgAx0urLFgAZYAqaplyThpKjfG1qmvrKydcbYWTXnBXVZKWzEvLp-hk6N3E7nULKetVt42hn9RMMkFZVVaiT7Eh5WKXUgSvN7FZm_iuKdHfhvRgSPeG9I8hLXtI_YNck01uupCjadrdKB_Q1O-EJcS_r3ZQX8ZefIc
CitedBy_id	crossref_primary_10_1109_TNNLS_2024_3349467 crossref_primary_10_1109_TPAMI_2024_3457538 crossref_primary_10_1109_TASE_2024_3355152 crossref_primary_10_1109_LRA_2022_3216996
Cites_doi	10.1016/j.neucom.2020.09.085 10.1038/nature16961 10.1038/nature24270 10.1109/CDC.2014.7039601 10.21314/JOR.2000.038 10.7551/mitpress/3206.001.0001 10.1038/nature14236 10.1109/ICCPS.2018.00022 10.1109/CDC.2018.8619572 10.1016/j.automatica.2013.02.003 10.1023/A:1017513905271 10.1145/1273496.1273553 10.1057/palgrave.jors.2600425 10.1016/j.automatica.2004.08.019 10.1145/584091.584093 10.1609/aaai.v32i1.11796 10.1007/978-3-030-59854-9_3 10.1007/978-3-319-11662-4_12 10.1109/CDC.2016.7798979
ContentType	Journal Article
Copyright	The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022 The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022.
Copyright_xml	– notice: The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022 – notice: The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022.
DBID	AAYXX CITATION 3V. 7SC 7XB 88I 8AL 8AO 8FD 8FE 8FG 8FK ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L7M L~C L~D M0N M2P P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U
DOI	10.1007/s10994-021-06103-6
DatabaseName	CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Collection (ProQuest) ProQuest Computer Science Collection Computer Science Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database Science Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic
DatabaseTitle	CrossRef Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Pharma Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Collection ProQuest Computing ProQuest Science Journals (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest Central (Alumni) ProQuest One Academic (New)
DatabaseTitleList	Computer Science Database
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1573-0565
EndPage	203
ExternalDocumentID	10_1007_s10994_021_06103_6
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29M 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 6TJ 78A 88I 8AO 8FE 8FG 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAEWM AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABIVO ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACGOD ACHSB ACHXU ACKNC ACMDZ ACMLO ACNCT ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IHE IJ- IKXTQ ITG ITH ITM IWAJR IXC IZIGR IZQ I~X I~Y I~Z J-C J0Z JBSCW JCJTX JZLTJ K6V K7- KDC KOV KOW LAK LLZTM M0N M2P M4Y MA- MVM N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P62 P9O PF- PQQKQ PROAC PT4 Q2X QF4 QM1 QN7 QO4 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZC RZE S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TAE TEORI TN5 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VXZ W23 W48 WH7 WIP WK8 XJT YLTOR Z45 Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z83 Z85 Z86 Z87 Z88 Z8M Z8N Z8O Z8P Z8Q Z8R Z8S Z8T Z8U Z8W Z8Z Z91 Z92 ZMTXR AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP AMVHM ATHPR AYFIA CITATION PHGZM PHGZT 7SC 7XB 8AL 8FD 8FK ABRTQ JQ2 L7M L~C L~D PKEHL PQEST PQGLB PQUKI PRINS Q9U
ID	FETCH-LOGICAL-c363t-8136aceec34b8bebee8ae181b28cc6a11aaf9899bf7c5c9b5c6f53c776142f6b3
IEDL.DBID	8FG
ISSN	0885-6125
IngestDate	Fri Jul 25 03:45:16 EDT 2025 Tue Jul 01 00:46:07 EDT 2025 Thu Apr 24 22:59:03 EDT 2025 Fri Feb 21 02:45:27 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Active learning Safe reinforcement learning Gaussian process
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c363t-8136aceec34b8bebee8ae181b28cc6a11aaf9899bf7c5c9b5c6f53c776142f6b3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-2669-9513
OpenAccessLink	https://link.springer.com/content/pdf/10.1007/s10994-021-06103-6.pdf
PQID	2625127475
PQPubID	54194
PageCount	31
ParticipantIDs	proquest_journals_2625127475 crossref_primary_10_1007_s10994_021_06103_6 crossref_citationtrail_10_1007_s10994_021_06103_6 springer_journals_10_1007_s10994_021_06103_6
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20220100 2022-01-00 20220101
PublicationDateYYYYMMDD	2022-01-01
PublicationDate_xml	– month: 1 year: 2022 text: 20220100
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Dordrecht
PublicationTitle	Machine learning
PublicationTitleAbbrev	Mach Learn
PublicationYear	2022
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	AswaniAGonzalezHSastrySSTomlinCProvably safe and robust learning-based model predictive controlAutomatica201349512161226304399610.1016/j.automatica.2013.02.003 SilverDHuangAMaddisonCJGuezASifreLVan Den DriesscheGSchrittwieserJAntonoglouIPanneershelvamVLanctotMMastering the game of go with deep neural networks and tree searchNature2016529758748448910.1038/nature16961 Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In: International conference on machine learning, pp 1889–1897. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:170706347. Jain, A., Nghiem, T., Morari, M., & Mangharam, R. (2018). Learning and control using gaussian processes. In: ACM/IEEE international conference on cyber-physical systems, pp 140–149. Chow, Y., Nachum, O., Duenez-Guzman, E., & Ghavamzadeh, M. (2018). A lyapunov-based approach to safe reinforcement learning. In: Advances in neural information processing systems, pp 8092–8101. Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. In: International conference on learning representations. Gal, Y., Islam, R., & Ghahramani, Z. (2017). Deep bayesian active learning with image data. arXiv preprint arXiv:170302910 Janner, M., Fu, J., Zhang, M., & Levine, S. (2019). When to trust your model: Model-based policy optimization. arXiv preprint arXiv:190608253 Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:160601540 BertsekasDPNonlinear programmingJournal of the Operational Research Society199748333410.1057/palgrave.jors.2600425 Saphal, R., Ravindran, B., Mudigere, D., Avancha, S., & Kaul, B. (2020). Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:200105209. AltmanEConstrained markov decision processes1999CRC Press0963.90068 Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., & Tassa, Y. (2018). Safe exploration in continuous action spaces. arXiv preprint arXiv:180108757 GohCYangXNonlinear lagrangian theory for nonconvex optimizationJournal of Optimization Theory and Applications2001109199121183342610.1023/A:1017513905271 Camacho, E.F., & Alba, C.B. (2013). Model predictive control. Springer Science & Business Media. van Amersfoort, J., Smith, L., Teh, Y.W., & Gal, Y. (2020). Simple and scalable epistemic uncertainty estimation using a single deep deterministic neural network. arXiv preprint arXiv:200302037 ChowYGhavamzadehMJansonLPavoneMRisk-constrained reinforcement learning with percentile risk criteriaThe Journal of Machine Learning Research20171816070612038138161471.90160 Berkenkamp, F., Moriconi, R., Schoellig, A.P., & Krause, A. (2016). Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In: 2016 IEEE 55th conference on decision and control (CDC), IEEE, pp 4661–4666. Kamthe, S., & Deisenroth, M.P. (2018). Data-efficient reinforcement learning with probabilistic model predictive control. In: International conference on artificial intelligence and statistics. Krause, A., & Guestrin, C. (2007). Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In: International conference on machine learning, pp 449–456. Sutton, R.S., & Barto, A.G. (2018). Reinforcement learning: An introduction. MIT press Damianou, A., Titsias, M.K., Lawrence, N.D. (2011). Variational gaussian process dynamical systems. In: Advances in neural information processing systems, pp 2510–2518. FedorovVVTheory of optimal experiments2013Elsevier Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., & Duenez-Guzman, E. (2019). Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:190110031 KhalilHKGrizzleJWNonlinear systems2002Prentice hall Upper Saddle River Prashanth, L., & Ghavamzadeh, M. (2013). Actor-critic algorithms for risk-sensitive mdps. In: Advances in neural information processing systems, pp 252–260. KrauseASinghAGuestrinCNear-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studiesJournal of Machine Learning Research20089Feb2352841225.681929(Feb):235–284 Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:13125602 Shyam, P., Jaśkowski, W., & Gomez, F. (2019). Model-based active exploration. In: International conference on machine learning. Deisenroth, M., & Rasmussen, C.E. (2011). Pilco: A model-based and data-efficient approach to policy search. In: International conference on machine learning, pp 465–472. RockafellarRTUryasevSOptimization of conditional value-at-riskJournal of Risk20002214210.21314/JOR.2000.038 Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: a cvar optimization approach. In: Advances in neural information processing systems, pp 1522–1530. Polymenakos, K., Abate, A., & Roberts, S. (2019). Safe policy search using gaussian process models. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1565–1573. ShannonCEA mathematical theory of communicationACM SIGMOBILE Mobile Computing and Communications Review20015135555211910.1145/584091.584093 Koller, T., Berkenkamp, F., Turchetta, M., & Krause, A. (2018). Learning-based model predictive control for safe exploration. In: IEEE conference on decision and control, pp 6059–6066. SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature2017550767635435910.1038/nature24270 de Wolff, T., Cuevas, A., & Tobar, F. (2020). Mogptk: The multi-output gaussian process toolkit. arXiv preprint arXiv:200203471 Prashanth, L. (2014). Policy gradients for cvar-constrained mdps. In: International conference on algorithmic learning theory, Springer, pp 155–169. Buisson-Fenet, M., Solowjow, F., & Trimpe, S. (2019). Actively learning gaussian process dynamics. arXiv preprint arXiv:191109946 Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In: AAAI conference on artificial intelligence. Srinivas, A., Laskin, M., & Abbeel, P. (2020). Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:200404136. Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for cvar optimization in mdps. In: Advances in neural information processing systems, pp 3509–3517 Ha, D., & Schmidhuber, J. (2018). World models. arXiv preprint arXiv:180310122 Gardner, J., Pleiss, G., Weinberger, K.Q., Bindel, D., & Wilson, A.G. (2018). Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems, pp 7576–7586 Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In: International conference on machine learning, pp 22–31. Hjmshi. (2018). hjmshi/pytorch-lbfgs. https://githubcom/hjmshi/PyTorch-LBFGS Ball, P., Parker-Holder, J., Pacchiano, A., Choromanski, K., & Roberts, S. (2020). Ready policy one: World building through active learning. arXiv preprint arXiv:200202693 Schultheis, M., Belousov, B., Abdulsamad, H., & Peters, J. (2019). Receding horizon curiosity. In: Conference on robot learning. Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., & Tomlin, C.J. (2014). Reachability-based safe learning with gaussian processes. In: IEEE conference on decision and control, pp 1424–1431. RasmussenCEWilliamsCKIGaussian processes for machine learning (Adaptive computation and machine learning)2005The MIT Press10.7551/mitpress/3206.001.0001 Zimmer, C., Meister, M., & Nguyen-Tuong, D. (2018). Safe active learning for time-series modeling with gaussian processes. In: Advances in neural information processing systems, pp 2730–2739. MnihVKavukcuogluKSilverDRusuAAVenessJBellemareMGHuman-level control through deep reinforcement learningNature2015518754052953310.1038/nature14236 Settles, B. (2009). Active learning literature survey. Tech. rep.: University of Wisconsin-Madison Department of Computer Sciences. Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., & Davidson, J. (2019). Learning latent dynamics for planning from pixels. In: International conference on machine learning. Polymenakos, K., Rontsis, N., Abate, A., & Roberts, S. (2020). SafePILCO: A software tool for safe and data-efficient policy synthesis. In D. N. Jansen & A. Remke (Eds.), Gribaudo M (pp. 18–26). Quantitative Evaluation of Systems: Springer International Publishing. Petersen, K., & Pedersen, M., et al. (2008). The matrix cookbook, vol. 7. Technical University of Denmark 15. Ammar, H.B., Eaton, E., Ruvolo, P., & Taylor, M. (2014). Online multi-task learning for policy gradient methods. In: International conference on machine learning, pp 1206–1214. Ray, A., Achiam, J., & Amodei, D. (2019). Benchmarking safe exploration in deep reinforcement learning. https://cdn.openai.com/safexp-short.pdf. Berkenkamp, F., Turchetta, M., Schoellig, A., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In: Advances in neural information processing systems, pp 908–918. MayneDQSeronMMRakovićSRobust model predictive control of constrained linear systems with bounded disturbancesAutomatica2005412219224215765610.1016/j.automatica.2004.08.019 6103_CR27 6103_CR26 6103_CR29 6103_CR28 RT Rockafellar (6103_CR48) 2000; 2 6103_CR23 6103_CR24 E Altman (6103_CR3) 1999 6103_CR4 6103_CR5 6103_CR7 6103_CR1 6103_CR2 6103_CR21 6103_CR20 6103_CR8 6103_CR9 Y Chow (6103_CR16) 2017; 18 6103_CR61 6103_CR60 6103_CR39 6103_CR33 6103_CR36 6103_CR35 6103_CR30 D Silver (6103_CR57) 2017; 550 6103_CR32 6103_CR31 A Aswani (6103_CR6) 2013; 49 6103_CR49 6103_CR45 6103_CR44 DP Bertsekas (6103_CR10) 1997; 48 6103_CR47 V Mnih (6103_CR40) 2015; 518 C Goh (6103_CR25) 2001; 109 VV Fedorov (6103_CR22) 2013 6103_CR41 6103_CR43 6103_CR42 HK Khalil (6103_CR34) 2002 6103_CR15 6103_CR59 6103_CR18 6103_CR17 A Krause (6103_CR37) 2008; 9 6103_CR12 6103_CR11 6103_CR55 6103_CR14 6103_CR58 6103_CR13 DQ Mayne (6103_CR38) 2005; 41 CE Rasmussen (6103_CR46) 2005 D Silver (6103_CR56) 2016; 529 6103_CR19 6103_CR52 CE Shannon (6103_CR54) 2001; 5 6103_CR51 6103_CR53 6103_CR50
References_xml	– reference: Shyam, P., Jaśkowski, W., & Gomez, F. (2019). Model-based active exploration. In: International conference on machine learning. – reference: Hjmshi. (2018). hjmshi/pytorch-lbfgs. https://githubcom/hjmshi/PyTorch-LBFGS – reference: Camacho, E.F., & Alba, C.B. (2013). Model predictive control. Springer Science & Business Media. – reference: Prashanth, L. (2014). Policy gradients for cvar-constrained mdps. In: International conference on algorithmic learning theory, Springer, pp 155–169. – reference: Jain, A., Nghiem, T., Morari, M., & Mangharam, R. (2018). Learning and control using gaussian processes. In: ACM/IEEE international conference on cyber-physical systems, pp 140–149. – reference: Koller, T., Berkenkamp, F., Turchetta, M., & Krause, A. (2018). Learning-based model predictive control for safe exploration. In: IEEE conference on decision and control, pp 6059–6066. – reference: Ball, P., Parker-Holder, J., Pacchiano, A., Choromanski, K., & Roberts, S. (2020). Ready policy one: World building through active learning. arXiv preprint arXiv:200202693 – reference: Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., & Davidson, J. (2019). Learning latent dynamics for planning from pixels. In: International conference on machine learning. – reference: RockafellarRTUryasevSOptimization of conditional value-at-riskJournal of Risk20002214210.21314/JOR.2000.038 – reference: Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. In: International conference on learning representations. – reference: Polymenakos, K., Rontsis, N., Abate, A., & Roberts, S. (2020). SafePILCO: A software tool for safe and data-efficient policy synthesis. In D. N. Jansen & A. Remke (Eds.), Gribaudo M (pp. 18–26). Quantitative Evaluation of Systems: Springer International Publishing. – reference: Ray, A., Achiam, J., & Amodei, D. (2019). Benchmarking safe exploration in deep reinforcement learning. https://cdn.openai.com/safexp-short.pdf. – reference: AltmanEConstrained markov decision processes1999CRC Press0963.90068 – reference: Berkenkamp, F., Moriconi, R., Schoellig, A.P., & Krause, A. (2016). Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In: 2016 IEEE 55th conference on decision and control (CDC), IEEE, pp 4661–4666. – reference: Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In: International conference on machine learning, pp 1889–1897. – reference: Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: a cvar optimization approach. In: Advances in neural information processing systems, pp 1522–1530. – reference: ShannonCEA mathematical theory of communicationACM SIGMOBILE Mobile Computing and Communications Review20015135555211910.1145/584091.584093 – reference: Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., & Tomlin, C.J. (2014). Reachability-based safe learning with gaussian processes. In: IEEE conference on decision and control, pp 1424–1431. – reference: Damianou, A., Titsias, M.K., Lawrence, N.D. (2011). Variational gaussian process dynamical systems. In: Advances in neural information processing systems, pp 2510–2518. – reference: SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature2017550767635435910.1038/nature24270 – reference: Schultheis, M., Belousov, B., Abdulsamad, H., & Peters, J. (2019). Receding horizon curiosity. In: Conference on robot learning. – reference: SilverDHuangAMaddisonCJGuezASifreLVan Den DriesscheGSchrittwieserJAntonoglouIPanneershelvamVLanctotMMastering the game of go with deep neural networks and tree searchNature2016529758748448910.1038/nature16961 – reference: Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In: International conference on machine learning, pp 22–31. – reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:170706347. – reference: KhalilHKGrizzleJWNonlinear systems2002Prentice hall Upper Saddle River – reference: KrauseASinghAGuestrinCNear-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studiesJournal of Machine Learning Research20089Feb2352841225.681929(Feb):235–284 – reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:13125602 – reference: Berkenkamp, F., Turchetta, M., Schoellig, A., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In: Advances in neural information processing systems, pp 908–918. – reference: BertsekasDPNonlinear programmingJournal of the Operational Research Society199748333410.1057/palgrave.jors.2600425 – reference: FedorovVVTheory of optimal experiments2013Elsevier – reference: Gal, Y., Islam, R., & Ghahramani, Z. (2017). Deep bayesian active learning with image data. arXiv preprint arXiv:170302910 – reference: Zimmer, C., Meister, M., & Nguyen-Tuong, D. (2018). Safe active learning for time-series modeling with gaussian processes. In: Advances in neural information processing systems, pp 2730–2739. – reference: Petersen, K., & Pedersen, M., et al. (2008). The matrix cookbook, vol. 7. Technical University of Denmark 15. – reference: Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:160601540 – reference: GohCYangXNonlinear lagrangian theory for nonconvex optimizationJournal of Optimization Theory and Applications2001109199121183342610.1023/A:1017513905271 – reference: Settles, B. (2009). Active learning literature survey. Tech. rep.: University of Wisconsin-Madison Department of Computer Sciences. – reference: Polymenakos, K., Abate, A., & Roberts, S. (2019). Safe policy search using gaussian process models. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1565–1573. – reference: Janner, M., Fu, J., Zhang, M., & Levine, S. (2019). When to trust your model: Model-based policy optimization. arXiv preprint arXiv:190608253 – reference: Krause, A., & Guestrin, C. (2007). Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In: International conference on machine learning, pp 449–456. – reference: Kamthe, S., & Deisenroth, M.P. (2018). Data-efficient reinforcement learning with probabilistic model predictive control. In: International conference on artificial intelligence and statistics. – reference: ChowYGhavamzadehMJansonLPavoneMRisk-constrained reinforcement learning with percentile risk criteriaThe Journal of Machine Learning Research20171816070612038138161471.90160 – reference: MayneDQSeronMMRakovićSRobust model predictive control of constrained linear systems with bounded disturbancesAutomatica2005412219224215765610.1016/j.automatica.2004.08.019 – reference: Chow, Y., Nachum, O., Duenez-Guzman, E., & Ghavamzadeh, M. (2018). A lyapunov-based approach to safe reinforcement learning. In: Advances in neural information processing systems, pp 8092–8101. – reference: Deisenroth, M., & Rasmussen, C.E. (2011). Pilco: A model-based and data-efficient approach to policy search. In: International conference on machine learning, pp 465–472. – reference: RasmussenCEWilliamsCKIGaussian processes for machine learning (Adaptive computation and machine learning)2005The MIT Press10.7551/mitpress/3206.001.0001 – reference: AswaniAGonzalezHSastrySSTomlinCProvably safe and robust learning-based model predictive controlAutomatica201349512161226304399610.1016/j.automatica.2013.02.003 – reference: de Wolff, T., Cuevas, A., & Tobar, F. (2020). Mogptk: The multi-output gaussian process toolkit. arXiv preprint arXiv:200203471 – reference: Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for cvar optimization in mdps. In: Advances in neural information processing systems, pp 3509–3517 – reference: Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., & Tassa, Y. (2018). Safe exploration in continuous action spaces. arXiv preprint arXiv:180108757 – reference: Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., & Duenez-Guzman, E. (2019). Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:190110031 – reference: Gardner, J., Pleiss, G., Weinberger, K.Q., Bindel, D., & Wilson, A.G. (2018). Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems, pp 7576–7586 – reference: Buisson-Fenet, M., Solowjow, F., & Trimpe, S. (2019). Actively learning gaussian process dynamics. arXiv preprint arXiv:191109946 – reference: Ha, D., & Schmidhuber, J. (2018). World models. arXiv preprint arXiv:180310122 – reference: Prashanth, L., & Ghavamzadeh, M. (2013). Actor-critic algorithms for risk-sensitive mdps. In: Advances in neural information processing systems, pp 252–260. – reference: Srinivas, A., Laskin, M., & Abbeel, P. (2020). Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:200404136. – reference: Ammar, H.B., Eaton, E., Ruvolo, P., & Taylor, M. (2014). Online multi-task learning for policy gradient methods. In: International conference on machine learning, pp 1206–1214. – reference: Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In: AAAI conference on artificial intelligence. – reference: Saphal, R., Ravindran, B., Mudigere, D., Avancha, S., & Kaul, B. (2020). Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:200105209. – reference: MnihVKavukcuogluKSilverDRusuAAVenessJBellemareMGHuman-level control through deep reinforcement learningNature2015518754052953310.1038/nature14236 – reference: van Amersfoort, J., Smith, L., Teh, Y.W., & Gal, Y. (2020). Simple and scalable epistemic uncertainty estimation using a single deep deterministic neural network. arXiv preprint arXiv:200302037 – reference: Sutton, R.S., & Barto, A.G. (2018). Reinforcement learning: An introduction. MIT press – ident: 6103_CR5 – ident: 6103_CR27 – ident: 6103_CR52 – ident: 6103_CR23 – ident: 6103_CR17 – ident: 6103_CR1 – ident: 6103_CR9 – volume: 9 start-page: 235 issue: Feb year: 2008 ident: 6103_CR37 publication-title: Journal of Machine Learning Research – ident: 6103_CR60 doi: 10.1016/j.neucom.2020.09.085 – volume: 529 start-page: 484 issue: 7587 year: 2016 ident: 6103_CR56 publication-title: Nature doi: 10.1038/nature16961 – volume-title: Theory of optimal experiments year: 2013 ident: 6103_CR22 – ident: 6103_CR33 – ident: 6103_CR14 – volume: 550 start-page: 354 issue: 7676 year: 2017 ident: 6103_CR57 publication-title: Nature doi: 10.1038/nature24270 – ident: 6103_CR26 – ident: 6103_CR47 – ident: 6103_CR2 doi: 10.1109/CDC.2014.7039601 – volume: 2 start-page: 21 year: 2000 ident: 6103_CR48 publication-title: Journal of Risk doi: 10.21314/JOR.2000.038 – volume-title: Constrained markov decision processes year: 1999 ident: 6103_CR3 – ident: 6103_CR18 – ident: 6103_CR32 – ident: 6103_CR53 – ident: 6103_CR39 – ident: 6103_CR15 – volume-title: Gaussian processes for machine learning (Adaptive computation and machine learning) year: 2005 ident: 6103_CR46 doi: 10.7551/mitpress/3206.001.0001 – ident: 6103_CR11 – ident: 6103_CR42 – volume-title: Nonlinear systems year: 2002 ident: 6103_CR34 – volume: 518 start-page: 529 issue: 7540 year: 2015 ident: 6103_CR40 publication-title: Nature doi: 10.1038/nature14236 – ident: 6103_CR50 – ident: 6103_CR21 – ident: 6103_CR31 doi: 10.1109/ICCPS.2018.00022 – ident: 6103_CR19 – ident: 6103_CR58 – ident: 6103_CR7 – ident: 6103_CR35 doi: 10.1109/CDC.2018.8619572 – ident: 6103_CR41 – volume: 49 start-page: 1216 issue: 5 year: 2013 ident: 6103_CR6 publication-title: Automatica doi: 10.1016/j.automatica.2013.02.003 – ident: 6103_CR12 – ident: 6103_CR28 – ident: 6103_CR45 – volume: 109 start-page: 99 issue: 1 year: 2001 ident: 6103_CR25 publication-title: Journal of Optimization Theory and Applications doi: 10.1023/A:1017513905271 – ident: 6103_CR36 doi: 10.1145/1273496.1273553 – ident: 6103_CR24 – ident: 6103_CR49 – ident: 6103_CR51 – volume: 48 start-page: 334 issue: 3 year: 1997 ident: 6103_CR10 publication-title: Journal of the Operational Research Society doi: 10.1057/palgrave.jors.2600425 – volume: 41 start-page: 219 issue: 2 year: 2005 ident: 6103_CR38 publication-title: Automatica doi: 10.1016/j.automatica.2004.08.019 – volume: 5 start-page: 3 issue: 1 year: 2001 ident: 6103_CR54 publication-title: ACM SIGMOBILE Mobile Computing and Communications Review doi: 10.1145/584091.584093 – ident: 6103_CR20 – ident: 6103_CR30 – ident: 6103_CR55 – ident: 6103_CR29 doi: 10.1609/aaai.v32i1.11796 – ident: 6103_CR43 doi: 10.1007/978-3-030-59854-9_3 – ident: 6103_CR59 – ident: 6103_CR4 – ident: 6103_CR44 doi: 10.1007/978-3-319-11662-4_12 – ident: 6103_CR61 – ident: 6103_CR13 – volume: 18 start-page: 6070 issue: 1 year: 2017 ident: 6103_CR16 publication-title: The Journal of Machine Learning Research – ident: 6103_CR8 doi: 10.1109/CDC.2016.7798979
SSID	ssj0002686
Score	2.5042505
Snippet	In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	173
SubjectTerms	Active learning Algorithms Artificial Intelligence Computer Science Control Gaussian process Information theory Learning Machine Learning Mechatronics Multiple objective analysis Natural Language Processing (NLP) Robotics Simulation and Modeling Special Issue: Foundations of Data Science Statistical methods
SummonAdditionalLinks	– databaseName: SpringerLINK - Czech Republic Consortium dbid: AGYKE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV29TsMwED5Bu7BQfkWhoAyIBVzVceIkbAG1VKCyQKUyRfbVYQAV1KYLT8Oz8GTYrkOhAqTOcazk7LvvO_t-AI4RI4WaiGrfRHASKEWJ1LydMEmHqsVlkKBJTu7d8m4_uB6EA5cUNimj3csrSWupvyW72TK2vonUoS1G-CpUQxoncQWq6dXDTfvLAvvcdnjUChQSg-AuWeb3WX4C0pxlLlyMWrzp1KBffukszOSpOS1kE98Wijgu-ysbsO4IqJfOdswmrKjRFtTK5g6e0_VtiO_S3kV67k1ErjzbL4cYxBt-vJ94whpJb6xs2VW0J4ye6z_xuAP9Tvv-sktcmwWCjLOCxJRxobESWSBjqRdVxUJp4Jd-jMgFpULkiXbLZB5hiIkMkechwyjSyO7nXLJdqIxeRmoPPE1HjEuGHBMRRC09LYtNkGpLRjiUitaBlrLO0NUgN60wnrN59WQjmkyLJrOiyXgdTr_eeZ1V4Ph3dKNcwsxp4yTzuWFx2nEK63BWrsj88d-z7S83_ADWfJMdYU9oGlApxlN1qDlLIY_cFv0Eeu3fsw priority: 102 providerName: Springer Nature
Title	SAMBA: safe model-based & active reinforcement learning
URI	https://link.springer.com/article/10.1007/s10994-021-06103-6 https://www.proquest.com/docview/2625127475
Volume	111
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NTwIxEG0ULl78NqJIejBetJFud7uLF7MYPqKBGJUET5t26HoxgID_32npQjSRUw_d7WHaznudduYRcgkQG0AiimcTJVloDGcaeTsTmo9MXeqwATY5udeX3UH4OIyGPuA2988qC5_oHPVoAjZGfhtIi8RIfqP76RezqlH2dtVLaGyTMkekses8aXdWnjiQTukRN1LELJL7pBmfOueK4gb23Q-vCyZ_A9Oabf65IHW4094nu54w0nQ5wwdky4wPyV4hxkD93jwi8Wvaa6Z3dK5yQ52-DbMINaJXVDmfRmfGVUkFFxCkXi7i45gM2q23hy7zqggMhBQLlnAhFUIbiFAnGufAJMogTusgAZCKc6XyBp6idB5DBA0dgcwjAXGMQBzkUosTUhpPxuaUUGQP9gQFEhoqjOs4rEjsm9K6jmGkDa8QXpgkA18y3CpXfGbrYsfWjBmaMXNmzGSFXK_-mS4LZmz8ulpYOvObZ56tp7pCbgrrr7v_H-1s82jnZCewyQsugFIlpcXs21wgpVjomls3NVJO281m37ad96cWts1W__kFewdB-gM7qcoW
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3JTsMwEB1BOcCFHVFWH4ALWDRx4qRICLVQVJZWiEXiFuypwwWVpUWIn-IbGbsOFUhw45xkpLyMZ944nnkAG4iJQSKiVJsoySNjAq6Jt3Ohg46pSB1V0TYnt9qyeROd3sa3I_BR9MLYY5VFTHSBuvOIdo98N5Q2ExP5jQ-enrlVjbJ_VwsJjYFbnJn3NyrZevsnR_R9N8PwuHF92OReVYCjkKLP00BIRakBRaRTTe9gUmUoz-kwRZQqCJTKq1SF6DzBGKs6RpnHAhOq96Mwl1qQ3VEYi2xHawnG6o32xeVX7A-l05akpRtzyx18m45v1nNjeEN70iioCC6_p8Ihv_3xS9ZluuNpmPQUldUGPjUDI6Y7C1OF_APz0WAOkqtaq17bYz2VG-YUdbjNiR22xZSLouzFuLms6LYgmReouJ-Hm39BbAFK3ceuWQRGfMXWbCixqqKkQmZFak-xVnSCHW2CMgQFJBn6IeVWK-MhG45XtjBmBGPmYMxkGba_nnkajOj48-6VAunML9deNnSuMuwU6A8v_25t6W9r6zDevG6dZ-cn7bNlmAht64TbvlmBUv_l1awSoenrNe9FDO7-23E_AQtaBQU
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDLZ4SIgLb8R45gBcIGJt2rRFQmg8xnMTEiBxK4mXckEbsCHEX-PX4WQpE0hw49zWUlzH_pzY_gDWERODBEQpN1GSR8YEXBNu50IHLVOVOsrQNic3mvL0Njq_i--G4KPshbFllaVPdI661UF7Rr4TShuJCfzGO4Uvi7g6qu8_PXPLIGVvWks6jb6JXJj3N0rfuntnR_SvN8KwfnxzeMo9wwBHIUWPp4GQisIEikinmtZjUmUo5ukwRZQqCJQqMspIdJFgjJmOURaxwIRy_ygspBYkdxhGE5FkNvFL6ydfUSCUjmWSNnHMLYrwDTu-bc8N5A1tzVFQFVx-D4oDpPvjctbFvPoUTHiwymp965qGIdOegcmSCIJ5vzALyXWtcVDbZV1VGOa4dbiNji22yZTzp-zFuAmt6A4jmaeqeJiD23_R1zyMtDttswCMkIvN3lBipqKkSmJFautZqzrBljZBBYJSJTn6ceWWNeMxHwxatmrMSY25U2MuK7D19c1Tf1jHn28vl5rO_cbt5gMzq8B2qf3B49-lLf4tbQ3GyFzzy7PmxRKMh7aHwp3jLMNI7-XVrBCy6elVZ0IM7v_bZj8Bm6oH1Q
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SAMBA%3A+safe+model-based+%26+active+reinforcement+learning&rft.jtitle=Machine+learning&rft.au=Cowen-Rivers%2C+Alexander+I&rft.au=Palenicek%2C+Daniel&rft.au=Moens%2C+Vincent&rft.au=Abdullah%2C+Mohammed+Amin&rft.date=2022-01-01&rft.pub=Springer+Nature+B.V&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=111&rft.issue=1&rft.spage=173&rft.epage=203&rft_id=info:doi/10.1007%2Fs10994-021-06103-6&rft.externalDBID=HAS_PDF_LINK
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon