SAMBA: safe model-based & active reinforcement learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process ev...

Full description

Saved in:
Bibliographic Details
Published inMachine learning Vol. 111; no. 1; pp. 173 - 203
Main Authors Cowen-Rivers, Alexander I., Palenicek, Daniel, Moens, Vincent, Abdullah, Mohammed Amin, Sootla, Aivar, Wang, Jun, Bou-Ammar, Haitham
Format Journal Article
LanguageEnglish
Published New York Springer US 01.01.2022
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0885-6125
1573-0565
DOI10.1007/s10994-021-06103-6

Cover

Abstract In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our acquisition functions and safety constraints.
AbstractList In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our acquisition functions and safety constraints.
Author Cowen-Rivers, Alexander I.
Sootla, Aivar
Bou-Ammar, Haitham
Wang, Jun
Abdullah, Mohammed Amin
Palenicek, Daniel
Moens, Vincent
Author_xml – sequence: 1
  givenname: Alexander I.
  orcidid: 0000-0002-2669-9513
  surname: Cowen-Rivers
  fullname: Cowen-Rivers, Alexander I.
  email: alexander.cowen.rivers@huawei.com
  organization: Huawei Noah’s Ark Lab, Technical University Darmstadt
– sequence: 2
  givenname: Daniel
  surname: Palenicek
  fullname: Palenicek, Daniel
  organization: Huawei Noah’s Ark Lab, Technical University Darmstadt
– sequence: 3
  givenname: Vincent
  surname: Moens
  fullname: Moens, Vincent
  organization: Huawei Noah’s Ark Lab
– sequence: 4
  givenname: Mohammed Amin
  surname: Abdullah
  fullname: Abdullah, Mohammed Amin
  organization: Huawei Noah’s Ark Lab
– sequence: 5
  givenname: Aivar
  surname: Sootla
  fullname: Sootla, Aivar
  organization: Huawei Noah’s Ark Lab
– sequence: 6
  givenname: Jun
  surname: Wang
  fullname: Wang, Jun
  organization: Huawei Noah’s Ark Lab, University College London
– sequence: 7
  givenname: Haitham
  surname: Bou-Ammar
  fullname: Bou-Ammar, Haitham
  organization: Huawei Noah’s Ark Lab, University College London
BookMark eNp9kM1KAzEUhYNUsK2-gKsBwV00P5NMxl0t_oHiQl2HJL0pU6aZmqSCb-Oz-GSOjiC46OpuznfO5ZugUegCIHRMyRklpDpPlNR1iQmjmEhKOJZ7aExFxTERUozQmCglsKRMHKBJSitCCJNKjpF6mj1czi6KZDwU624BLbYmweLz47QwLjdvUERogu-igzWEXLRgYmjC8hDte9MmOPq9U_RyffU8v8X3jzd389k9dlzyjBXl0jgAx0urLFgAZYAqaplyThpKjfG1qmvrKydcbYWTXnBXVZKWzEvLp-hk6N3E7nULKetVt42hn9RMMkFZVVaiT7Eh5WKXUgSvN7FZm_iuKdHfhvRgSPeG9I8hLXtI_YNck01uupCjadrdKB_Q1O-EJcS_r3ZQX8ZefIc
CitedBy_id crossref_primary_10_1109_TNNLS_2024_3349467
crossref_primary_10_1109_TPAMI_2024_3457538
crossref_primary_10_1109_TASE_2024_3355152
crossref_primary_10_1109_LRA_2022_3216996
Cites_doi 10.1016/j.neucom.2020.09.085
10.1038/nature16961
10.1038/nature24270
10.1109/CDC.2014.7039601
10.21314/JOR.2000.038
10.7551/mitpress/3206.001.0001
10.1038/nature14236
10.1109/ICCPS.2018.00022
10.1109/CDC.2018.8619572
10.1016/j.automatica.2013.02.003
10.1023/A:1017513905271
10.1145/1273496.1273553
10.1057/palgrave.jors.2600425
10.1016/j.automatica.2004.08.019
10.1145/584091.584093
10.1609/aaai.v32i1.11796
10.1007/978-3-030-59854-9_3
10.1007/978-3-319-11662-4_12
10.1109/CDC.2016.7798979
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022
The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022
– notice: The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2022.
DBID AAYXX
CITATION
3V.
7SC
7XB
88I
8AL
8AO
8FD
8FE
8FG
8FK
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L7M
L~C
L~D
M0N
M2P
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.1007/s10994-021-06103-6
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Collection (ProQuest)
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
Science Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Pharma Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Collection
ProQuest Computing
ProQuest Science Journals (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest Central (Alumni)
ProQuest One Academic (New)
DatabaseTitleList
Computer Science Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0565
EndPage 203
ExternalDocumentID 10_1007_s10994_021_06103_6
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
2.D
203
28-
29M
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
6TJ
78A
88I
8AO
8FE
8FG
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAEWM
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABIVO
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACGOD
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACPIV
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BGLVJ
BGNMA
BPHCQ
BSONS
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I-F
I09
IHE
IJ-
IKXTQ
ITG
ITH
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Y
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K6V
K7-
KDC
KOV
KOW
LAK
LLZTM
M0N
M2P
M4Y
MA-
MVM
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P62
P9O
PF-
PQQKQ
PROAC
PT4
Q2X
QF4
QM1
QN7
QO4
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZC
RZE
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TAE
TEORI
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VXZ
W23
W48
WH7
WIP
WK8
XJT
YLTOR
Z45
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z83
Z85
Z86
Z87
Z88
Z8M
Z8N
Z8O
Z8P
Z8Q
Z8R
Z8S
Z8T
Z8U
Z8W
Z8Z
Z91
Z92
ZMTXR
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
AMVHM
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
7SC
7XB
8AL
8FD
8FK
ABRTQ
JQ2
L7M
L~C
L~D
PKEHL
PQEST
PQGLB
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c363t-8136aceec34b8bebee8ae181b28cc6a11aaf9899bf7c5c9b5c6f53c776142f6b3
IEDL.DBID 8FG
ISSN 0885-6125
IngestDate Fri Jul 25 03:45:16 EDT 2025
Tue Jul 01 00:46:07 EDT 2025
Thu Apr 24 22:59:03 EDT 2025
Fri Feb 21 02:45:27 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Active learning
Safe reinforcement learning
Gaussian process
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c363t-8136aceec34b8bebee8ae181b28cc6a11aaf9899bf7c5c9b5c6f53c776142f6b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-2669-9513
OpenAccessLink https://link.springer.com/content/pdf/10.1007/s10994-021-06103-6.pdf
PQID 2625127475
PQPubID 54194
PageCount 31
ParticipantIDs proquest_journals_2625127475
crossref_primary_10_1007_s10994_021_06103_6
crossref_citationtrail_10_1007_s10994_021_06103_6
springer_journals_10_1007_s10994_021_06103_6
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20220100
2022-01-00
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 1
  year: 2022
  text: 20220100
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationTitle Machine learning
PublicationTitleAbbrev Mach Learn
PublicationYear 2022
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References AswaniAGonzalezHSastrySSTomlinCProvably safe and robust learning-based model predictive controlAutomatica201349512161226304399610.1016/j.automatica.2013.02.003
SilverDHuangAMaddisonCJGuezASifreLVan Den DriesscheGSchrittwieserJAntonoglouIPanneershelvamVLanctotMMastering the game of go with deep neural networks and tree searchNature2016529758748448910.1038/nature16961
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In: International conference on machine learning, pp 1889–1897.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:170706347.
Jain, A., Nghiem, T., Morari, M., & Mangharam, R. (2018). Learning and control using gaussian processes. In: ACM/IEEE international conference on cyber-physical systems, pp 140–149.
Chow, Y., Nachum, O., Duenez-Guzman, E., & Ghavamzadeh, M. (2018). A lyapunov-based approach to safe reinforcement learning. In: Advances in neural information processing systems, pp 8092–8101.
Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. In: International conference on learning representations.
Gal, Y., Islam, R., & Ghahramani, Z. (2017). Deep bayesian active learning with image data. arXiv preprint arXiv:170302910
Janner, M., Fu, J., Zhang, M., & Levine, S. (2019). When to trust your model: Model-based policy optimization. arXiv preprint arXiv:190608253
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:160601540
BertsekasDPNonlinear programmingJournal of the Operational Research Society199748333410.1057/palgrave.jors.2600425
Saphal, R., Ravindran, B., Mudigere, D., Avancha, S., & Kaul, B. (2020). Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:200105209.
AltmanEConstrained markov decision processes1999CRC Press0963.90068
Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., & Tassa, Y. (2018). Safe exploration in continuous action spaces. arXiv preprint arXiv:180108757
GohCYangXNonlinear lagrangian theory for nonconvex optimizationJournal of Optimization Theory and Applications2001109199121183342610.1023/A:1017513905271
Camacho, E.F., & Alba, C.B. (2013). Model predictive control. Springer Science & Business Media.
van Amersfoort, J., Smith, L., Teh, Y.W., & Gal, Y. (2020). Simple and scalable epistemic uncertainty estimation using a single deep deterministic neural network. arXiv preprint arXiv:200302037
ChowYGhavamzadehMJansonLPavoneMRisk-constrained reinforcement learning with percentile risk criteriaThe Journal of Machine Learning Research20171816070612038138161471.90160
Berkenkamp, F., Moriconi, R., Schoellig, A.P., & Krause, A. (2016). Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In: 2016 IEEE 55th conference on decision and control (CDC), IEEE, pp 4661–4666.
Kamthe, S., & Deisenroth, M.P. (2018). Data-efficient reinforcement learning with probabilistic model predictive control. In: International conference on artificial intelligence and statistics.
Krause, A., & Guestrin, C. (2007). Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In: International conference on machine learning, pp 449–456.
Sutton, R.S., & Barto, A.G. (2018). Reinforcement learning: An introduction. MIT press
Damianou, A., Titsias, M.K., Lawrence, N.D. (2011). Variational gaussian process dynamical systems. In: Advances in neural information processing systems, pp 2510–2518.
FedorovVVTheory of optimal experiments2013Elsevier
Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., & Duenez-Guzman, E. (2019). Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:190110031
KhalilHKGrizzleJWNonlinear systems2002Prentice hall Upper Saddle River
Prashanth, L., & Ghavamzadeh, M. (2013). Actor-critic algorithms for risk-sensitive mdps. In: Advances in neural information processing systems, pp 252–260.
KrauseASinghAGuestrinCNear-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studiesJournal of Machine Learning Research20089Feb2352841225.681929(Feb):235–284
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:13125602
Shyam, P., Jaśkowski, W., & Gomez, F. (2019). Model-based active exploration. In: International conference on machine learning.
Deisenroth, M., & Rasmussen, C.E. (2011). Pilco: A model-based and data-efficient approach to policy search. In: International conference on machine learning, pp 465–472.
RockafellarRTUryasevSOptimization of conditional value-at-riskJournal of Risk20002214210.21314/JOR.2000.038
Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: a cvar optimization approach. In: Advances in neural information processing systems, pp 1522–1530.
Polymenakos, K., Abate, A., & Roberts, S. (2019). Safe policy search using gaussian process models. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1565–1573.
ShannonCEA mathematical theory of communicationACM SIGMOBILE Mobile Computing and Communications Review20015135555211910.1145/584091.584093
Koller, T., Berkenkamp, F., Turchetta, M., & Krause, A. (2018). Learning-based model predictive control for safe exploration. In: IEEE conference on decision and control, pp 6059–6066.
SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature2017550767635435910.1038/nature24270
de Wolff, T., Cuevas, A., & Tobar, F. (2020). Mogptk: The multi-output gaussian process toolkit. arXiv preprint arXiv:200203471
Prashanth, L. (2014). Policy gradients for cvar-constrained mdps. In: International conference on algorithmic learning theory, Springer, pp 155–169.
Buisson-Fenet, M., Solowjow, F., & Trimpe, S. (2019). Actively learning gaussian process dynamics. arXiv preprint arXiv:191109946
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In: AAAI conference on artificial intelligence.
Srinivas, A., Laskin, M., & Abbeel, P. (2020). Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:200404136.
Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for cvar optimization in mdps. In: Advances in neural information processing systems, pp 3509–3517
Ha, D., & Schmidhuber, J. (2018). World models. arXiv preprint arXiv:180310122
Gardner, J., Pleiss, G., Weinberger, K.Q., Bindel, D., & Wilson, A.G. (2018). Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems, pp 7576–7586
Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In: International conference on machine learning, pp 22–31.
Hjmshi. (2018). hjmshi/pytorch-lbfgs. https://githubcom/hjmshi/PyTorch-LBFGS
Ball, P., Parker-Holder, J., Pacchiano, A., Choromanski, K., & Roberts, S. (2020). Ready policy one: World building through active learning. arXiv preprint arXiv:200202693
Schultheis, M., Belousov, B., Abdulsamad, H., & Peters, J. (2019). Receding horizon curiosity. In: Conference on robot learning.
Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., & Tomlin, C.J. (2014). Reachability-based safe learning with gaussian processes. In: IEEE conference on decision and control, pp 1424–1431.
RasmussenCEWilliamsCKIGaussian processes for machine learning (Adaptive computation and machine learning)2005The MIT Press10.7551/mitpress/3206.001.0001
Zimmer, C., Meister, M., & Nguyen-Tuong, D. (2018). Safe active learning for time-series modeling with gaussian processes. In: Advances in neural information processing systems, pp 2730–2739.
MnihVKavukcuogluKSilverDRusuAAVenessJBellemareMGHuman-level control through deep reinforcement learningNature2015518754052953310.1038/nature14236
Settles, B. (2009). Active learning literature survey. Tech. rep.: University of Wisconsin-Madison Department of Computer Sciences.
Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., & Davidson, J. (2019). Learning latent dynamics for planning from pixels. In: International conference on machine learning.
Polymenakos, K., Rontsis, N., Abate, A., & Roberts, S. (2020). SafePILCO: A software tool for safe and data-efficient policy synthesis. In D. N. Jansen & A. Remke (Eds.), Gribaudo M (pp. 18–26). Quantitative Evaluation of Systems: Springer International Publishing.
Petersen, K., & Pedersen, M., et al. (2008). The matrix cookbook, vol. 7. Technical University of Denmark 15.
Ammar, H.B., Eaton, E., Ruvolo, P., & Taylor, M. (2014). Online multi-task learning for policy gradient methods. In: International conference on machine learning, pp 1206–1214.
Ray, A., Achiam, J., & Amodei, D. (2019). Benchmarking safe exploration in deep reinforcement learning. https://cdn.openai.com/safexp-short.pdf.
Berkenkamp, F., Turchetta, M., Schoellig, A., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In: Advances in neural information processing systems, pp 908–918.
MayneDQSeronMMRakovićSRobust model predictive control of constrained linear systems with bounded disturbancesAutomatica2005412219224215765610.1016/j.automatica.2004.08.019
6103_CR27
6103_CR26
6103_CR29
6103_CR28
RT Rockafellar (6103_CR48) 2000; 2
6103_CR23
6103_CR24
E Altman (6103_CR3) 1999
6103_CR4
6103_CR5
6103_CR7
6103_CR1
6103_CR2
6103_CR21
6103_CR20
6103_CR8
6103_CR9
Y Chow (6103_CR16) 2017; 18
6103_CR61
6103_CR60
6103_CR39
6103_CR33
6103_CR36
6103_CR35
6103_CR30
D Silver (6103_CR57) 2017; 550
6103_CR32
6103_CR31
A Aswani (6103_CR6) 2013; 49
6103_CR49
6103_CR45
6103_CR44
DP Bertsekas (6103_CR10) 1997; 48
6103_CR47
V Mnih (6103_CR40) 2015; 518
C Goh (6103_CR25) 2001; 109
VV Fedorov (6103_CR22) 2013
6103_CR41
6103_CR43
6103_CR42
HK Khalil (6103_CR34) 2002
6103_CR15
6103_CR59
6103_CR18
6103_CR17
A Krause (6103_CR37) 2008; 9
6103_CR12
6103_CR11
6103_CR55
6103_CR14
6103_CR58
6103_CR13
DQ Mayne (6103_CR38) 2005; 41
CE Rasmussen (6103_CR46) 2005
D Silver (6103_CR56) 2016; 529
6103_CR19
6103_CR52
CE Shannon (6103_CR54) 2001; 5
6103_CR51
6103_CR53
6103_CR50
References_xml – reference: Shyam, P., Jaśkowski, W., & Gomez, F. (2019). Model-based active exploration. In: International conference on machine learning.
– reference: Hjmshi. (2018). hjmshi/pytorch-lbfgs. https://githubcom/hjmshi/PyTorch-LBFGS
– reference: Camacho, E.F., & Alba, C.B. (2013). Model predictive control. Springer Science & Business Media.
– reference: Prashanth, L. (2014). Policy gradients for cvar-constrained mdps. In: International conference on algorithmic learning theory, Springer, pp 155–169.
– reference: Jain, A., Nghiem, T., Morari, M., & Mangharam, R. (2018). Learning and control using gaussian processes. In: ACM/IEEE international conference on cyber-physical systems, pp 140–149.
– reference: Koller, T., Berkenkamp, F., Turchetta, M., & Krause, A. (2018). Learning-based model predictive control for safe exploration. In: IEEE conference on decision and control, pp 6059–6066.
– reference: Ball, P., Parker-Holder, J., Pacchiano, A., Choromanski, K., & Roberts, S. (2020). Ready policy one: World building through active learning. arXiv preprint arXiv:200202693
– reference: Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., & Davidson, J. (2019). Learning latent dynamics for planning from pixels. In: International conference on machine learning.
– reference: RockafellarRTUryasevSOptimization of conditional value-at-riskJournal of Risk20002214210.21314/JOR.2000.038
– reference: Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to control: Learning behaviors by latent imagination. In: International conference on learning representations.
– reference: Polymenakos, K., Rontsis, N., Abate, A., & Roberts, S. (2020). SafePILCO: A software tool for safe and data-efficient policy synthesis. In D. N. Jansen & A. Remke (Eds.), Gribaudo M (pp. 18–26). Quantitative Evaluation of Systems: Springer International Publishing.
– reference: Ray, A., Achiam, J., & Amodei, D. (2019). Benchmarking safe exploration in deep reinforcement learning. https://cdn.openai.com/safexp-short.pdf.
– reference: AltmanEConstrained markov decision processes1999CRC Press0963.90068
– reference: Berkenkamp, F., Moriconi, R., Schoellig, A.P., & Krause, A. (2016). Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In: 2016 IEEE 55th conference on decision and control (CDC), IEEE, pp 4661–4666.
– reference: Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In: International conference on machine learning, pp 1889–1897.
– reference: Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: a cvar optimization approach. In: Advances in neural information processing systems, pp 1522–1530.
– reference: ShannonCEA mathematical theory of communicationACM SIGMOBILE Mobile Computing and Communications Review20015135555211910.1145/584091.584093
– reference: Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., & Tomlin, C.J. (2014). Reachability-based safe learning with gaussian processes. In: IEEE conference on decision and control, pp 1424–1431.
– reference: Damianou, A., Titsias, M.K., Lawrence, N.D. (2011). Variational gaussian process dynamical systems. In: Advances in neural information processing systems, pp 2510–2518.
– reference: SilverDSchrittwieserJSimonyanKAntonoglouIHuangAGuezAHubertTBakerLLaiMBoltonAMastering the game of go without human knowledgeNature2017550767635435910.1038/nature24270
– reference: Schultheis, M., Belousov, B., Abdulsamad, H., & Peters, J. (2019). Receding horizon curiosity. In: Conference on robot learning.
– reference: SilverDHuangAMaddisonCJGuezASifreLVan Den DriesscheGSchrittwieserJAntonoglouIPanneershelvamVLanctotMMastering the game of go with deep neural networks and tree searchNature2016529758748448910.1038/nature16961
– reference: Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In: International conference on machine learning, pp 22–31.
– reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:170706347.
– reference: KhalilHKGrizzleJWNonlinear systems2002Prentice hall Upper Saddle River
– reference: KrauseASinghAGuestrinCNear-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studiesJournal of Machine Learning Research20089Feb2352841225.681929(Feb):235–284
– reference: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:13125602
– reference: Berkenkamp, F., Turchetta, M., Schoellig, A., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In: Advances in neural information processing systems, pp 908–918.
– reference: BertsekasDPNonlinear programmingJournal of the Operational Research Society199748333410.1057/palgrave.jors.2600425
– reference: FedorovVVTheory of optimal experiments2013Elsevier
– reference: Gal, Y., Islam, R., & Ghahramani, Z. (2017). Deep bayesian active learning with image data. arXiv preprint arXiv:170302910
– reference: Zimmer, C., Meister, M., & Nguyen-Tuong, D. (2018). Safe active learning for time-series modeling with gaussian processes. In: Advances in neural information processing systems, pp 2730–2739.
– reference: Petersen, K., & Pedersen, M., et al. (2008). The matrix cookbook, vol. 7. Technical University of Denmark 15.
– reference: Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:160601540
– reference: GohCYangXNonlinear lagrangian theory for nonconvex optimizationJournal of Optimization Theory and Applications2001109199121183342610.1023/A:1017513905271
– reference: Settles, B. (2009). Active learning literature survey. Tech. rep.: University of Wisconsin-Madison Department of Computer Sciences.
– reference: Polymenakos, K., Abate, A., & Roberts, S. (2019). Safe policy search using gaussian process models. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 1565–1573.
– reference: Janner, M., Fu, J., Zhang, M., & Levine, S. (2019). When to trust your model: Model-based policy optimization. arXiv preprint arXiv:190608253
– reference: Krause, A., & Guestrin, C. (2007). Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In: International conference on machine learning, pp 449–456.
– reference: Kamthe, S., & Deisenroth, M.P. (2018). Data-efficient reinforcement learning with probabilistic model predictive control. In: International conference on artificial intelligence and statistics.
– reference: ChowYGhavamzadehMJansonLPavoneMRisk-constrained reinforcement learning with percentile risk criteriaThe Journal of Machine Learning Research20171816070612038138161471.90160
– reference: MayneDQSeronMMRakovićSRobust model predictive control of constrained linear systems with bounded disturbancesAutomatica2005412219224215765610.1016/j.automatica.2004.08.019
– reference: Chow, Y., Nachum, O., Duenez-Guzman, E., & Ghavamzadeh, M. (2018). A lyapunov-based approach to safe reinforcement learning. In: Advances in neural information processing systems, pp 8092–8101.
– reference: Deisenroth, M., & Rasmussen, C.E. (2011). Pilco: A model-based and data-efficient approach to policy search. In: International conference on machine learning, pp 465–472.
– reference: RasmussenCEWilliamsCKIGaussian processes for machine learning (Adaptive computation and machine learning)2005The MIT Press10.7551/mitpress/3206.001.0001
– reference: AswaniAGonzalezHSastrySSTomlinCProvably safe and robust learning-based model predictive controlAutomatica201349512161226304399610.1016/j.automatica.2013.02.003
– reference: de Wolff, T., Cuevas, A., & Tobar, F. (2020). Mogptk: The multi-output gaussian process toolkit. arXiv preprint arXiv:200203471
– reference: Chow, Y., & Ghavamzadeh, M. (2014). Algorithms for cvar optimization in mdps. In: Advances in neural information processing systems, pp 3509–3517
– reference: Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., & Tassa, Y. (2018). Safe exploration in continuous action spaces. arXiv preprint arXiv:180108757
– reference: Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., & Duenez-Guzman, E. (2019). Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:190110031
– reference: Gardner, J., Pleiss, G., Weinberger, K.Q., Bindel, D., & Wilson, A.G. (2018). Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In: Advances in neural information processing systems, pp 7576–7586
– reference: Buisson-Fenet, M., Solowjow, F., & Trimpe, S. (2019). Actively learning gaussian process dynamics. arXiv preprint arXiv:191109946
– reference: Ha, D., & Schmidhuber, J. (2018). World models. arXiv preprint arXiv:180310122
– reference: Prashanth, L., & Ghavamzadeh, M. (2013). Actor-critic algorithms for risk-sensitive mdps. In: Advances in neural information processing systems, pp 252–260.
– reference: Srinivas, A., Laskin, M., & Abbeel, P. (2020). Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:200404136.
– reference: Ammar, H.B., Eaton, E., Ruvolo, P., & Taylor, M. (2014). Online multi-task learning for policy gradient methods. In: International conference on machine learning, pp 1206–1214.
– reference: Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In: AAAI conference on artificial intelligence.
– reference: Saphal, R., Ravindran, B., Mudigere, D., Avancha, S., & Kaul, B. (2020). Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:200105209.
– reference: MnihVKavukcuogluKSilverDRusuAAVenessJBellemareMGHuman-level control through deep reinforcement learningNature2015518754052953310.1038/nature14236
– reference: van Amersfoort, J., Smith, L., Teh, Y.W., & Gal, Y. (2020). Simple and scalable epistemic uncertainty estimation using a single deep deterministic neural network. arXiv preprint arXiv:200302037
– reference: Sutton, R.S., & Barto, A.G. (2018). Reinforcement learning: An introduction. MIT press
– ident: 6103_CR5
– ident: 6103_CR27
– ident: 6103_CR52
– ident: 6103_CR23
– ident: 6103_CR17
– ident: 6103_CR1
– ident: 6103_CR9
– volume: 9
  start-page: 235
  issue: Feb
  year: 2008
  ident: 6103_CR37
  publication-title: Journal of Machine Learning Research
– ident: 6103_CR60
  doi: 10.1016/j.neucom.2020.09.085
– volume: 529
  start-page: 484
  issue: 7587
  year: 2016
  ident: 6103_CR56
  publication-title: Nature
  doi: 10.1038/nature16961
– volume-title: Theory of optimal experiments
  year: 2013
  ident: 6103_CR22
– ident: 6103_CR33
– ident: 6103_CR14
– volume: 550
  start-page: 354
  issue: 7676
  year: 2017
  ident: 6103_CR57
  publication-title: Nature
  doi: 10.1038/nature24270
– ident: 6103_CR26
– ident: 6103_CR47
– ident: 6103_CR2
  doi: 10.1109/CDC.2014.7039601
– volume: 2
  start-page: 21
  year: 2000
  ident: 6103_CR48
  publication-title: Journal of Risk
  doi: 10.21314/JOR.2000.038
– volume-title: Constrained markov decision processes
  year: 1999
  ident: 6103_CR3
– ident: 6103_CR18
– ident: 6103_CR32
– ident: 6103_CR53
– ident: 6103_CR39
– ident: 6103_CR15
– volume-title: Gaussian processes for machine learning (Adaptive computation and machine learning)
  year: 2005
  ident: 6103_CR46
  doi: 10.7551/mitpress/3206.001.0001
– ident: 6103_CR11
– ident: 6103_CR42
– volume-title: Nonlinear systems
  year: 2002
  ident: 6103_CR34
– volume: 518
  start-page: 529
  issue: 7540
  year: 2015
  ident: 6103_CR40
  publication-title: Nature
  doi: 10.1038/nature14236
– ident: 6103_CR50
– ident: 6103_CR21
– ident: 6103_CR31
  doi: 10.1109/ICCPS.2018.00022
– ident: 6103_CR19
– ident: 6103_CR58
– ident: 6103_CR7
– ident: 6103_CR35
  doi: 10.1109/CDC.2018.8619572
– ident: 6103_CR41
– volume: 49
  start-page: 1216
  issue: 5
  year: 2013
  ident: 6103_CR6
  publication-title: Automatica
  doi: 10.1016/j.automatica.2013.02.003
– ident: 6103_CR12
– ident: 6103_CR28
– ident: 6103_CR45
– volume: 109
  start-page: 99
  issue: 1
  year: 2001
  ident: 6103_CR25
  publication-title: Journal of Optimization Theory and Applications
  doi: 10.1023/A:1017513905271
– ident: 6103_CR36
  doi: 10.1145/1273496.1273553
– ident: 6103_CR24
– ident: 6103_CR49
– ident: 6103_CR51
– volume: 48
  start-page: 334
  issue: 3
  year: 1997
  ident: 6103_CR10
  publication-title: Journal of the Operational Research Society
  doi: 10.1057/palgrave.jors.2600425
– volume: 41
  start-page: 219
  issue: 2
  year: 2005
  ident: 6103_CR38
  publication-title: Automatica
  doi: 10.1016/j.automatica.2004.08.019
– volume: 5
  start-page: 3
  issue: 1
  year: 2001
  ident: 6103_CR54
  publication-title: ACM SIGMOBILE Mobile Computing and Communications Review
  doi: 10.1145/584091.584093
– ident: 6103_CR20
– ident: 6103_CR30
– ident: 6103_CR55
– ident: 6103_CR29
  doi: 10.1609/aaai.v32i1.11796
– ident: 6103_CR43
  doi: 10.1007/978-3-030-59854-9_3
– ident: 6103_CR59
– ident: 6103_CR4
– ident: 6103_CR44
  doi: 10.1007/978-3-319-11662-4_12
– ident: 6103_CR61
– ident: 6103_CR13
– volume: 18
  start-page: 6070
  issue: 1
  year: 2017
  ident: 6103_CR16
  publication-title: The Journal of Machine Learning Research
– ident: 6103_CR8
  doi: 10.1109/CDC.2016.7798979
SSID ssj0002686
Score 2.5042505
Snippet In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 173
SubjectTerms Active learning
Algorithms
Artificial Intelligence
Computer Science
Control
Gaussian process
Information theory
Learning
Machine Learning
Mechatronics
Multiple objective analysis
Natural Language Processing (NLP)
Robotics
Simulation and Modeling
Special Issue: Foundations of Data Science
Statistical methods
SummonAdditionalLinks – databaseName: SpringerLINK - Czech Republic Consortium
  dbid: AGYKE
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV29TsMwED5Bu7BQfkWhoAyIBVzVceIkbAG1VKCyQKUyRfbVYQAV1KYLT8Oz8GTYrkOhAqTOcazk7LvvO_t-AI4RI4WaiGrfRHASKEWJ1LydMEmHqsVlkKBJTu7d8m4_uB6EA5cUNimj3csrSWupvyW72TK2vonUoS1G-CpUQxoncQWq6dXDTfvLAvvcdnjUChQSg-AuWeb3WX4C0pxlLlyMWrzp1KBffukszOSpOS1kE98Wijgu-ysbsO4IqJfOdswmrKjRFtTK5g6e0_VtiO_S3kV67k1ErjzbL4cYxBt-vJ94whpJb6xs2VW0J4ye6z_xuAP9Tvv-sktcmwWCjLOCxJRxobESWSBjqRdVxUJp4Jd-jMgFpULkiXbLZB5hiIkMkechwyjSyO7nXLJdqIxeRmoPPE1HjEuGHBMRRC09LYtNkGpLRjiUitaBlrLO0NUgN60wnrN59WQjmkyLJrOiyXgdTr_eeZ1V4Ph3dKNcwsxp4yTzuWFx2nEK63BWrsj88d-z7S83_ADWfJMdYU9oGlApxlN1qDlLIY_cFv0Eeu3fsw
  priority: 102
  providerName: Springer Nature
Title SAMBA: safe model-based & active reinforcement learning
URI https://link.springer.com/article/10.1007/s10994-021-06103-6
https://www.proquest.com/docview/2625127475
Volume 111
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NTwIxEG0ULl78NqJIejBetJFud7uLF7MYPqKBGJUET5t26HoxgID_32npQjSRUw_d7WHaznudduYRcgkQG0AiimcTJVloDGcaeTsTmo9MXeqwATY5udeX3UH4OIyGPuA2988qC5_oHPVoAjZGfhtIi8RIfqP76RezqlH2dtVLaGyTMkekses8aXdWnjiQTukRN1LELJL7pBmfOueK4gb23Q-vCyZ_A9Oabf65IHW4094nu54w0nQ5wwdky4wPyV4hxkD93jwi8Wvaa6Z3dK5yQ52-DbMINaJXVDmfRmfGVUkFFxCkXi7i45gM2q23hy7zqggMhBQLlnAhFUIbiFAnGufAJMogTusgAZCKc6XyBp6idB5DBA0dgcwjAXGMQBzkUosTUhpPxuaUUGQP9gQFEhoqjOs4rEjsm9K6jmGkDa8QXpgkA18y3CpXfGbrYsfWjBmaMXNmzGSFXK_-mS4LZmz8ulpYOvObZ56tp7pCbgrrr7v_H-1s82jnZCewyQsugFIlpcXs21wgpVjomls3NVJO281m37ad96cWts1W__kFewdB-gM7qcoW
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3JTsMwEB1BOcCFHVFWH4ALWDRx4qRICLVQVJZWiEXiFuypwwWVpUWIn-IbGbsOFUhw45xkpLyMZ944nnkAG4iJQSKiVJsoySNjAq6Jt3Ohg46pSB1V0TYnt9qyeROd3sa3I_BR9MLYY5VFTHSBuvOIdo98N5Q2ExP5jQ-enrlVjbJ_VwsJjYFbnJn3NyrZevsnR_R9N8PwuHF92OReVYCjkKLP00BIRakBRaRTTe9gUmUoz-kwRZQqCJTKq1SF6DzBGKs6RpnHAhOq96Mwl1qQ3VEYi2xHawnG6o32xeVX7A-l05akpRtzyx18m45v1nNjeEN70iioCC6_p8Ihv_3xS9ZluuNpmPQUldUGPjUDI6Y7C1OF_APz0WAOkqtaq17bYz2VG-YUdbjNiR22xZSLouzFuLms6LYgmReouJ-Hm39BbAFK3ceuWQRGfMXWbCixqqKkQmZFak-xVnSCHW2CMgQFJBn6IeVWK-MhG45XtjBmBGPmYMxkGba_nnkajOj48-6VAunML9deNnSuMuwU6A8v_25t6W9r6zDevG6dZ-cn7bNlmAht64TbvlmBUv_l1awSoenrNe9FDO7-23E_AQtaBQU
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8MwDLZ4SIgLb8R45gBcIGJt2rRFQmg8xnMTEiBxK4mXckEbsCHEX-PX4WQpE0hw49zWUlzH_pzY_gDWERODBEQpN1GSR8YEXBNu50IHLVOVOsrQNic3mvL0Njq_i--G4KPshbFllaVPdI661UF7Rr4TShuJCfzGO4Uvi7g6qu8_PXPLIGVvWks6jb6JXJj3N0rfuntnR_SvN8KwfnxzeMo9wwBHIUWPp4GQisIEikinmtZjUmUo5ukwRZQqCJQqMspIdJFgjJmOURaxwIRy_ygspBYkdxhGE5FkNvFL6ydfUSCUjmWSNnHMLYrwDTu-bc8N5A1tzVFQFVx-D4oDpPvjctbFvPoUTHiwymp965qGIdOegcmSCIJ5vzALyXWtcVDbZV1VGOa4dbiNji22yZTzp-zFuAmt6A4jmaeqeJiD23_R1zyMtDttswCMkIvN3lBipqKkSmJFautZqzrBljZBBYJSJTn6ceWWNeMxHwxatmrMSY25U2MuK7D19c1Tf1jHn28vl5rO_cbt5gMzq8B2qf3B49-lLf4tbQ3GyFzzy7PmxRKMh7aHwp3jLMNI7-XVrBCy6elVZ0IM7v_bZj8Bm6oH1Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SAMBA%3A+safe+model-based+%26+active+reinforcement+learning&rft.jtitle=Machine+learning&rft.au=Cowen-Rivers%2C+Alexander+I&rft.au=Palenicek%2C+Daniel&rft.au=Moens%2C+Vincent&rft.au=Abdullah%2C+Mohammed+Amin&rft.date=2022-01-01&rft.pub=Springer+Nature+B.V&rft.issn=0885-6125&rft.eissn=1573-0565&rft.volume=111&rft.issue=1&rft.spage=173&rft.epage=203&rft_id=info:doi/10.1007%2Fs10994-021-06103-6&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-6125&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-6125&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-6125&client=summon