Formal Specification and Testing for Reinforcement Learning

The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the dev...

Full description

Saved in:
Bibliographic Details
Published inProceedings of ACM on programming languages Vol. 7; no. ICFP; pp. 125 - 158
Main Authors Varshosaz, Mahsa, Ghaffari, Mohsen, Johnsen, Einar Broch, Wąsowski, Andrzej
Format Journal Article
LanguageEnglish
Published New York, NY, USA ACM 30.08.2023
Subjects
Online AccessGet full text
ISSN2475-1421
2475-1421
DOI10.1145/3607835

Cover

Loading…
Abstract The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to any reinforcement learning problem modeled using our library, without any additional effort from the programmer.
AbstractList The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to any reinforcement learning problem modeled using our library, without any additional effort from the programmer.
The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to any reinforcement learning problem modeled using our library, without any additional effort from the programmer.
ArticleNumber 193
Author Ghaffari, Mohsen
Wąsowski, Andrzej
Varshosaz, Mahsa
Johnsen, Einar Broch
Author_xml – sequence: 1
  givenname: Mahsa
  orcidid: 0000-0002-4776-883X
  surname: Varshosaz
  fullname: Varshosaz, Mahsa
  email: mahv@itu.dk
  organization: IT University of Copenhagen, Denmark
– sequence: 2
  givenname: Mohsen
  orcidid: 0000-0002-1939-9053
  surname: Ghaffari
  fullname: Ghaffari, Mohsen
  email: mohg@itu.dk
  organization: IT University of Copenhagen, Denmark
– sequence: 3
  givenname: Einar Broch
  orcidid: 0000-0001-5382-3949
  surname: Johnsen
  fullname: Johnsen, Einar Broch
  email: einarj@ifi.uio.no
  organization: University of Oslo, Norway
– sequence: 4
  givenname: Andrzej
  orcidid: 0000-0003-0532-2685
  surname: Wąsowski
  fullname: Wąsowski, Andrzej
  email: wasowski@itu.dk
  organization: IT University of Copenhagen, Denmark
BookMark eNpNkM1Lw0AQxRepYK3Fu6e9eYruZD-SxZMUq0JA0HoO081EVppN2eTif--GtuJl3gzvx2N4l2wW-kCMXYO4A1D6XhpRlFKfsXmuCp2BymH2b79gy2H4FkKAlaqUds4e1n3scMc_9uR86x2Ovg8cQ8M3NIw-fPG2j_ydfEjqqKMw8oowhmRdsfMWdwMtj7pgn-unzeolq96eX1ePVYa5UWOmGzRgwNqiLZwo06kl5gpMgSAbsdWtUqRVIzRZCdIQNtaCLI0kKLbYyAXjh1wX_fRTHfqINaSofJo2h4TcnpB-GCK19T76DuNPAuqpmfrYTCJvDiS67g86mb9Me1w4
Cites_doi 10.1145/357766.351266
10.1145/3551349.3556962
10.1007/978-3-030-91265-9_8
10.1109/ASE51524.2021.9678764
10.1145/3365365.3382216
10.1613/jair.301
10.1109/TSE.2023.3269804
10.1145/3468264.3468537
10.48550/arXiv.1707.06347
10.1007/978-3-030-17462-0_28
10.1111/j.2044-8317.2011.02037.x
10.4230/LIPIcs.CONCUR.2020.3
10.1609/aaai.v32i1.12107
10.5555/2789272.2886795
10.1609/aaai.v32i1.11631
10.1145/3468891.3468897
10.1109/ICSE43902.2021.00048
10.1016/j.tcs.2019.05.046
10.1145/3551349.3560429
10.1109/ICST53961.2022.00013
10.1109/ASE51524.2021.9678832
10.1007/978-3-031-22337-2_29
10.1201/9780429029608
10.1109/ASE51524.2021.9678566
10.1109/ADPRL.2009.4927542
10.1109/TSE.1977.231145
10.1145/2635868.2635920
10.1109/C-M.1978.218136
10.1007/978-3-031-13188-2_17
10.48550/ARXIV.2103.03938
10.48550/ARXIV.1702.02284
10.1016/bs.adcom.2018.03.015
10.1145/3238147.3238172
10.1609/aaai.v32i1.11797
10.1145/3510003.3510625
10.24963/ijcai.2017/525
10.1145/3533767.3534388
10.1145/503272.503288
10.5281/zenodo.8083298
10.1023/A:1022633531479
10.1007/978-3-031-19849-6_20
10.1007/978-3-662-49674-9_8
10.1145/3460319.3464844
10.24963/ijcai.2022/72
ContentType Journal Article
Copyright Owner/Author
info:eu-repo/semantics/openAccess
Copyright_xml – notice: Owner/Author
– notice: info:eu-repo/semantics/openAccess
DBID AAYXX
CITATION
3HK
DOI 10.1145/3607835
DatabaseName CrossRef
NORA - Norwegian Open Research Archives
DatabaseTitle CrossRef
DatabaseTitleList CrossRef


DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 158
ExternalDocumentID 10852_108921
10_1145_3607835
3607835
GrantInformation_xml – fundername: Innovation Fund Denmark
  grantid: DIREC
  funderid: http://dx.doi.org/10.13039/
GroupedDBID AAKMM
AAYFX
ACM
ADPZR
AIKLT
ALMA_UNASSIGNED_HOLDINGS
GUFHI
LHSKQ
M~E
OK1
ROL
AAYXX
AEFXT
AEJOY
AKRVB
CITATION
3HK
EBS
ID FETCH-LOGICAL-a264t-5da6161997f7c085da53a24167a13d0b5f44e54d05e93136ead9913863e17bad3
ISSN 2475-1421
IngestDate Thu Mar 28 06:48:23 EDT 2024
Thu Jul 03 08:30:21 EDT 2025
Fri Feb 21 01:13:21 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue ICFP
Keywords specification-based testing
Scala
reinforcement learning
Language English
License This work is licensed under a Creative Commons Attribution 4.0 International License.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a264t-5da6161997f7c085da53a24167a13d0b5f44e54d05e93136ead9913863e17bad3
ORCID 0000-0001-5382-3949
0000-0003-0532-2685
0000-0002-4776-883X
0000-0002-1939-9053
OpenAccessLink http://hdl.handle.net/10852/108921
PageCount 34
ParticipantIDs cristin_nora_10852_108921
crossref_primary_10_1145_3607835
acm_primary_3607835
PublicationCentury 2000
PublicationDate 2023-08-30
PublicationDateYYYYMMDD 2023-08-30
PublicationDate_xml – month: 08
  year: 2023
  text: 2023-08-30
  day: 30
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of ACM on programming languages
PublicationTitleAbbrev ACM PACMPL
PublicationYear 2023
Publisher ACM
Publisher_xml – name: ACM
References Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, and Pedro A. Ortega. 2021. Causal Analysis of Agent Behavior for AI Safety. arXiv. https://doi.org/10.48550/ARXIV.2103.03938
Richard S. Sutton. 1988. Learning to Predict by the Methods of Temporal Differences. Mach. Learn., 3, 1 (1988), 9–44. issn:0885-6125 https://doi.org/10.1023/A:1022633531479 10.1023/A:1022633531479
Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, and Richard S. Sutton. 2018. Multi-Step Reinforcement Learning: A Unifying Algorithm. In Proc. 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 354, 8 pages. isbn:978-1-57735-800-8 https://doi.org/10.1609/aaai.v32i1.11631 10.1609/aaai.v32i1.11631
Andrea Romdhana, Mariano Ceccato, Alessio Merlo, and Paolo Tonella. 2022. IFRIT: Focused Testing through Deep Reinforcement Learning. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 24–34. https://doi.org/10.1109/ICST53961.2022.00013 10.1109/ICST53961.2022.00013
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proc. 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). 109–119. https://doi.org/10.1145/3238147.3238172 10.1145/3238147.3238172
Richard G. Hamlet. 1977. Testing Programs with the Aid of a Compiler. IEEE Transactions on Software Engineering, 3, 4 (1977), 279–290. https://doi.org/10.1109/TSE.1977.231145 10.1109/TSE.1977.231145
Junrui Liu, Yanju Chen, Bryan Tan, Isil Dillig, and Yu Feng. 2022. Learning Contract Invariants Using Reinforcement Learning. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering, (ASE 2022). ACM Press, 63:1–63:11. https://doi.org/10.1145/3551349.3556962 10.1145/3551349.3556962
Rajeev Alur, Suguman Bansal, Osbert Bastani, and Kishor Jothimurugan. 2022. A Framework for Transforming Specifications in Reinforcement Learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Rupak Majumdar (Eds.) (Lecture Notes in Computer Science, Vol. 13660). Springer. https://doi.org/10.1007/978-3-031-22337-2_29 10.1007/978-3-031-22337-2_29
Norman Ramsey and Avi Pfeffer. 2002. Stochastic lambda calculus and monads of probability distributions. In Proc. 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002), John Launchbury and John C. Mitchell (Eds.). ACM Press, 154–165. https://doi.org/10.1145/503272.503288 10.1145/503272.503288
G. A. Rummery and M. Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Technical Report CUED/F-INFENF/TR, https://cir.nii.ac.jp/crid/1573668924277769344
Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2020. A game-based approximate verification of deep neural networks with provable guarantees. Theor. Comput. Sci., 807 (2020), 298–329. https://doi.org/10.1016/j.tcs.2019.05.046 10.1016/j.tcs.2019.05.046
Radoslav Ivanov, Taylor J Carpenter, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. 2020. Case study: verifying the safety of an autonomous racing car with a neural network controller. In Proc. 23rd International Conference on Hybrid Systems: Computation and Control. 1–7. https://doi.org/10.1145/3365365.3382216 10.1145/3365365.3382216
Harsh Vardhan and Janos Sztipanovits. 2021. Rare Event Failure Test Case Generation in Learning-Enabled-Controllers. In 2021 6th International Conference on Machine Learning Technologies (ICMLT 2021). ACM Press, 34–40. isbn:9781450389402 https://doi.org/10.1145/3468891.3468897 10.1145/3468891.3468897
Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR, abs/1801.08099 (2018), arXiv:1801.08099. arxiv:1801.08099
Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proc. AAAI Conference on Artificial Intelligence. 32, AAAI Press. https://doi.org/10.1609/aaai.v32i1.11797 10.1609/aaai.v32i1.11797
Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 355–367. https://doi.org/10.1109/ASE51524.2021.9678764 10.1109/ASE51524.2021.9678764
Avraham Ruderman, Richard Everett, Bristy Sikder, Hubert Soyer, Jonathan Uesato, Ananya Kumar, Charlie Beattie, and Pushmeet Kohli. 2019. Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis. In Safe Machine Learning workshop at ICLR 2019.
Nathan Fulton and André Platzer. 2018. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning. In Proc. Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6485–6492. https://doi.org/10.1609/aaai.v32i1.12107 10.1609/aaai.v32i1.12107
Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. 2017. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 3756–3762. isbn:9780999241103 https://doi.org/10.24963/ijcai.2017/525 10.24963/ijcai.2017/525
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press.
Uraz Cengiz Türker, Robert M. Hierons, Mohammad Reza Mousavi, and Ivan Y. Tyukin. 2021. Efficient state synchronisation in model-based testing through reinforcement learning. In Proc. 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 368–380. https://doi.org/10.1109/ASE51524.2021.9678566 10.1109/ASE51524.2021.9678566
Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards.
Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, and Joost-Pieter Katoen. 2016. Safety-Constrained Reinforcement Learning for MDPs. In Proc. 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), Marsha Chechik and Jean-François Raskin (Eds.) (Lecture Notes in Computer Science, Vol. 9636). Springer, 130–146. https://doi.org/10.1007/978-3-662-49674-9_8 10.1007/978-3-662-49674-9_8
Nathan Fulton and André Platzer. 2019. Verifiably safe off-model reinforcement learning. In Proc. 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2019) (Lecture Notes in Computer Science, Vol. 11427). 413–430. https://doi.org/10.1007/978-3-030-17462-0_28 10.1007/978-3-030-17462-0_28
Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In 5th ACM SIGPLAN International Conference on Functional Programming (ICFP’00). ACM Press, 268–279. https://doi.org/10.1145/357766.351266 10.1145/357766.351266
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. J. Artif. Intell. Res., 4 (1996), 237–285. https://doi.org/10.1613/jair.301 10.1613/jair.301
John Kruschke. 2014. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press.
Jianzhong Su, Hong-Ning Dai, Lingjun Zhao, Zibin Zheng, and Xiapu Luo. 2022. Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022). ACM Press, 36:1–36:12. https://doi.org/10.1145/3551349.3560429 10.1145/3551349.3560429
Amirhossein Zolfagharian, Manel Abdellatif, Lionel C. Briand, Mojtaba Bagherzadeh, and Ramesh S. 2023. A Search-Based Testing Approach for Deep Reinforcement Learning Agents. IEEE Transactions on Software Engineering, 1–22. https://doi.org/10.1109/TSE.2023.3269804 To appear 10.1109/TSE.2023.3269804
Yuteng Lu, Weidi Sun, and Meng Sun. 2021. Mutation Testing of Reinforcement Learning Systems. In Proc. 7th International Symposium on Dependable Software Engineering: Theories, Tools, and Applications (SETTA 2021), Shengchao Qin, Jim Woodcock, and Wenhui Zhang (Eds.) (Lecture Notes in Computer Science, Vol. 13071). Springer, 143–160. isbn:978-3-030-91265-9 https://doi.org/10.1007/978-3-030-91265-9_8 10.1007/978-3-030-91265-9_8
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey. Advances in Computers, Vol. 112. Elsevier, 275–378. issn:0065-2458 https://doi.org/10.1016/bs.adcom.2018.03.015 10.1016/bs.adcom.2018.03.015
Tuomas Oikarinen, Wang Zhang, Alexandre Megretski, Luca Daniel, and Tsui-Wei Weng. 2021. Robust Deep Reinforcement Learning through Adversarial Loss. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, 26156–26167.
Saikat Dutta, Jeeva Selvam, Aryaman Jain, and Sasa Misailovic. 2021. TERA: Optimizing Stochastic Regression Tests in Machine Learning Projects. In Proc. 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). ACM Press, 413––426. isbn:9781450384599 https://doi.org/10.1145/3460319.3464844 10.1145/3460319.3464844
Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press.
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of Software En
e_1_2_1_41_1
e_1_2_1_24_1
e_1_2_1_45_1
Sutton Richard S. (e_1_2_1_46_1) 2018
e_1_2_1_22_1
e_1_2_1_43_1
e_1_2_1_28_1
e_1_2_1_49_1
e_1_2_1_47_1
Abolfathi Elmira Amirloo (e_1_2_1_1_1) 2021
Oikarinen Tuomas (e_1_2_1_33_1)
e_1_2_1_31_1
e_1_2_1_54_1
e_1_2_1_8_1
e_1_2_1_56_1
e_1_2_1_6_1
e_1_2_1_12_1
e_1_2_1_35_1
Ruderman Avraham (e_1_2_1_39_1) 2019
e_1_2_1_50_1
e_1_2_1_4_1
e_1_2_1_10_1
e_1_2_1_52_1
e_1_2_1_2_1
e_1_2_1_14_1
e_1_2_1_37_1
e_1_2_1_58_1
e_1_2_1_18_1
Gelman Andrew (e_1_2_1_13_1)
e_1_2_1_42_1
e_1_2_1_40_1
Kruschke John (e_1_2_1_26_1)
e_1_2_1_44_1
Jothimurugan Kishor (e_1_2_1_21_1)
e_1_2_1_27_1
Hasanbeig Mohammadhosein (e_1_2_1_16_1) 2018
e_1_2_1_25_1
e_1_2_1_48_1
e_1_2_1_29_1
Jothimurugan Kishor (e_1_2_1_23_1) 2021; 1170
e_1_2_1_7_1
e_1_2_1_30_1
e_1_2_1_55_1
e_1_2_1_5_1
e_1_2_1_57_1
e_1_2_1_3_1
e_1_2_1_34_1
e_1_2_1_51_1
Jothimurugan Kishor (e_1_2_1_20_1)
e_1_2_1_11_1
e_1_2_1_53_1
e_1_2_1_17_1
e_1_2_1_38_1
e_1_2_1_15_1
e_1_2_1_36_1
e_1_2_1_9_1
e_1_2_1_19_1
References_xml – reference: John Kruschke. 2014. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press.
– reference: Jianzhong Su, Hong-Ning Dai, Lingjun Zhao, Zibin Zheng, and Xiapu Luo. 2022. Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022). ACM Press, 36:1–36:12. https://doi.org/10.1145/3551349.3560429 10.1145/3551349.3560429
– reference: Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey. Advances in Computers, Vol. 112. Elsevier, 275–378. issn:0065-2458 https://doi.org/10.1016/bs.adcom.2018.03.015 10.1016/bs.adcom.2018.03.015
– reference: Tuomas Oikarinen, Wang Zhang, Alexandre Megretski, Luca Daniel, and Tsui-Wei Weng. 2021. Robust Deep Reinforcement Learning through Adversarial Loss. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, 26156–26167.
– reference: Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, and Richard S. Sutton. 2018. Multi-Step Reinforcement Learning: A Unifying Algorithm. In Proc. 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 354, 8 pages. isbn:978-1-57735-800-8 https://doi.org/10.1609/aaai.v32i1.11631 10.1609/aaai.v32i1.11631
– reference: Richard G. Hamlet. 1977. Testing Programs with the Aid of a Compiler. IEEE Transactions on Software Engineering, 3, 4 (1977), 279–290. https://doi.org/10.1109/TSE.1977.231145 10.1109/TSE.1977.231145
– reference: Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in Machine Learning Software: Why? How? What to Do? In Proc. 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). ACM Press, 429––440. isbn:9781450385626 https://doi.org/10.1145/3468264.3468537 10.1145/3468264.3468537
– reference: Richard S. Sutton. 1988. Learning to Predict by the Methods of Temporal Differences. Mach. Learn., 3, 1 (1988), 9–44. issn:0885-6125 https://doi.org/10.1023/A:1022633531479 10.1023/A:1022633531479
– reference: Martin Tappler, Filip Cano Córdoba, Bernhard K. Aichernig, and Bettina Könighofer. 2022. Search-Based Testing of Reinforcement Learning. In Proc. Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 503–510. https://doi.org/10.24963/ijcai.2022/72 10.24963/ijcai.2022/72
– reference: Norman Ramsey and Avi Pfeffer. 2002. Stochastic lambda calculus and monads of probability distributions. In Proc. 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002), John Launchbury and John C. Mitchell (Eds.). ACM Press, 154–165. https://doi.org/10.1145/503272.503288 10.1145/503272.503288
– reference: Nathan Fulton and André Platzer. 2019. Verifiably safe off-model reinforcement learning. In Proc. 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2019) (Lecture Notes in Computer Science, Vol. 11427). 413–430. https://doi.org/10.1007/978-3-030-17462-0_28 10.1007/978-3-030-17462-0_28
– reference: Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, and Joost-Pieter Katoen. 2016. Safety-Constrained Reinforcement Learning for MDPs. In Proc. 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), Marsha Chechik and Jean-François Raskin (Eds.) (Lecture Notes in Computer Science, Vol. 9636). Springer, 130–146. https://doi.org/10.1007/978-3-662-49674-9_8 10.1007/978-3-662-49674-9_8
– reference: G. A. Rummery and M. Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Technical Report CUED/F-INFENF/TR, https://cir.nii.ac.jp/crid/1573668924277769344
– reference: Yan Zheng, Yi Liu, Xiaofei Xie, Yepang Liu, Lei Ma, Jianye Hao, and Yang Liu. 2021. Automatic Web Testing Using Curiosity-Driven Reinforcement Learning. In Proc. IEEE/ACM 43rd International Conference on Software Engineering (ICSE 2021). ACM Press, 423–435. https://doi.org/10.1109/ICSE43902.2021.00048 10.1109/ICSE43902.2021.00048
– reference: Richard McElreath. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan (2nd ed.). CRC Press.
– reference: Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR, abs/1801.08099 (2018), arXiv:1801.08099. arxiv:1801.08099
– reference: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, and Rajeev Alur. 2021. Compositional Reinforcement Learning from Logical Specifications. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, Inc., 10026–10039.
– reference: Junrui Liu, Yanju Chen, Bryan Tan, Isil Dillig, and Yu Feng. 2022. Learning Contract Invariants Using Reinforcement Learning. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering, (ASE 2022). ACM Press, 63:1–63:11. https://doi.org/10.1145/3551349.3556962 10.1145/3551349.3556962
– reference: Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press.
– reference: John Schulman, Filip Wolski, Pra fulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. ArXiv, abs/1707.06347 (2017), https://doi.org/10.48550/arXiv.1707.06347
– reference: Nathan Fulton and André Platzer. 2018. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning. In Proc. Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6485–6492. https://doi.org/10.1609/aaai.v32i1.12107 10.1609/aaai.v32i1.12107
– reference: Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem. 2020. Safe Reinforcement Learning Using Probabilistic Shields. In Proc. 31st International Conference on Concurrency Theory (CONCUR 2020), Igor Konnov and Laura Kovács (Eds.) (LIPIcs, Vol. 171). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 3:1–3:16. https://doi.org/10.4230/LIPIcs.CONCUR.2020.3 10.4230/LIPIcs.CONCUR.2020.3
– reference: Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. J. Artif. Intell. Res., 4 (1996), 237–285. https://doi.org/10.1613/jair.301 10.1613/jair.301
– reference: Rajeev Alur, Suguman Bansal, Osbert Bastani, and Kishor Jothimurugan. 2022. A Framework for Transforming Specifications in Reinforcement Learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Rupak Majumdar (Eds.) (Lecture Notes in Computer Science, Vol. 13660). Springer. https://doi.org/10.1007/978-3-031-22337-2_29 10.1007/978-3-031-22337-2_29
– reference: Harsh Vardhan and Janos Sztipanovits. 2021. Rare Event Failure Test Case Generation in Learning-Enabled-Controllers. In 2021 6th International Conference on Machine Learning Technologies (ICMLT 2021). ACM Press, 34–40. isbn:9781450389402 https://doi.org/10.1145/3468891.3468897 10.1145/3468891.3468897
– reference: Qi Pang, Yuanyuan Yuan, and Shu Wang. 2022. MDPFuzz: Testing Models Solving Markov Decision Processes. In Proc. 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). ACM Press, 378–390. isbn:9781450393799 https://doi.org/10.1145/3533767.3534388 10.1145/3533767.3534388
– reference: Rosalia Tufano, Simone Scalabrino, Luca Pascarella, Emad Aghajani, Rocco Oliveto, and Gabriele Bavota. 2022. Using Reinforcement Learning for Load Testing of Video Games. In Proc. IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022). ACM Press. https://doi.org/10.1145/3510003.3510625 10.1145/3510003.3510625
– reference: Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 11, 4 (1978), 34–41. https://doi.org/doi: 10.1109/C-M.1978.218136
– reference: Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2020. A game-based approximate verification of deep neural networks with provable guarantees. Theor. Comput. Sci., 807 (2020), 298–329. https://doi.org/10.1016/j.tcs.2019.05.046 10.1016/j.tcs.2019.05.046
– reference: Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proc. 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). 109–119. https://doi.org/10.1145/3238147.3238172 10.1145/3238147.3238172
– reference: Martin Tappler, Stefan Pranger, Bettina Könighofer, Edi Muskardin, Roderick Bloem, and Kim G. Larsen. 2022. Automata Learning Meets Shielding. In Proc. 11th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles (ISoLA 2022), Tiziana Margaria and Bernhard Steffen (Eds.) (Lecture Notes in Computer Science, Vol. 13701). Springer, 335–359. https://doi.org/10.1007/978-3-031-19849-6_20 10.1007/978-3-031-19849-6_20
– reference: Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proc. AAAI Conference on Artificial Intelligence. 32, AAAI Press. https://doi.org/10.1609/aaai.v32i1.11797 10.1609/aaai.v32i1.11797
– reference: Kishor Jothimurugan, Osbert Bastani, and Rajeev Alur. 2021. Abstract Value Iteration for Hierarchical Reinforcement Learning. In Proc. 24th International Conference on Artificial Intelligence and Statistics, Arindam Banerjee and Kenji Fukumizu (Eds.) (Proceedings of Machine Learning Research, Vol. 130). PMLR, 1162–1170.
– reference: Andrew Gelman and Cosma Rohilla Shalizi. 2013. Philosophy and the practice of Bayesian statistics. Brit. J. Math. Statist. Psych., 66, 1 (2013), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x 10.1111/j.2044-8317.2011.02037.x
– reference: Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press.
– reference: Andrew W. Moore. 1990. Efficient memory-based learning for robot control. Ph.D. Thesis, University of Cambridge.
– reference: Harm Van Seijen, Hado Van Hasselt, Shimon Whiteson, and Marco Wiering. 2009. A theoretical and empirical analysis of Expected SARSA. In Proc. Symposium on Adaptive Dynamic Programming and Reinforcement Learning. 177–184. https://doi.org/10.1109/ADPRL.2009.4927542 10.1109/ADPRL.2009.4927542
– reference: Yuteng Lu, Weidi Sun, and Meng Sun. 2021. Mutation Testing of Reinforcement Learning Systems. In Proc. 7th International Symposium on Dependable Software Engineering: Theories, Tools, and Applications (SETTA 2021), Shengchao Qin, Jim Woodcock, and Wenhui Zhang (Eds.) (Lecture Notes in Computer Science, Vol. 13071). Springer, 143–160. isbn:978-3-030-91265-9 https://doi.org/10.1007/978-3-030-91265-9_8 10.1007/978-3-030-91265-9_8
– reference: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, and Rajeev Alur. 2022. Specification-Guided Learning of Nash Equilibria with High Social Welfare. In Proc. 34th International Conference on Computer Aided Verification (CAV 2022), Sharon Shoham and Yakir Vizel (Eds.) (Lecture Notes in Computer Science, Vol. 13372). Springer, 343–363. https://doi.org/10.1007/978-3-031-13188-2_17 10.1007/978-3-031-13188-2_17
– reference: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, and Pedro A. Ortega. 2021. Causal Analysis of Agent Behavior for AI Safety. arXiv. https://doi.org/10.48550/ARXIV.2103.03938
– reference: Andrea Romdhana, Mariano Ceccato, Alessio Merlo, and Paolo Tonella. 2022. IFRIT: Focused Testing through Deep Reinforcement Learning. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 24–34. https://doi.org/10.1109/ICST53961.2022.00013 10.1109/ICST53961.2022.00013
– reference: Uraz Cengiz Türker, Robert M. Hierons, Mohammad Reza Mousavi, and Ivan Y. Tyukin. 2021. Efficient state synchronisation in model-based testing through reinforcement learning. In Proc. 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 368–380. https://doi.org/10.1109/ASE51524.2021.9678566 10.1109/ASE51524.2021.9678566
– reference: Saikat Dutta, Jeeva Selvam, Aryaman Jain, and Sasa Misailovic. 2021. TERA: Optimizing Stochastic Regression Tests in Machine Learning Projects. In Proc. 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). ACM Press, 413––426. isbn:9781450384599 https://doi.org/10.1145/3460319.3464844 10.1145/3460319.3464844
– reference: Kishor Jothimurugan, Rajeev Alur, and Osbert Bastani. 2019. A Composable Specification Language for Reinforcement Learning Tasks. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). 32, Curran Associates, Inc..
– reference: Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In 5th ACM SIGPLAN International Conference on Functional Programming (ICFP’00). ACM Press, 268–279. https://doi.org/10.1145/357766.351266 10.1145/357766.351266
– reference: Radoslav Ivanov, Taylor J Carpenter, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. 2020. Case study: verifying the safety of an autonomous racing car with a neural network controller. In Proc. 23rd International Conference on Hybrid Systems: Computation and Control. 1–7. https://doi.org/10.1145/3365365.3382216 10.1145/3365365.3382216
– reference: Shaohua Zhang, Shuang Liu, Jun Sun, Yuqi Chen, Wenzhi Huang, Jinyi Liu, Jian Liu, and Jianye Hao. 2021. FIGCPS: Effective Failure-inducing Input Generation for Cyber-Physical Systems with Deep Reinforcement Learning. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 555–567. https://doi.org/10.1109/ASE51524.2021.9678832 10.1109/ASE51524.2021.9678832
– reference: Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE-22), Shing-Chi Cheung, Alessandro Orso, and Margaret-Anne D. Storey (Eds.). ACM Press, 643–653. https://doi.org/10.1145/2635868.2635920 10.1145/2635868.2635920
– reference: Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards.
– reference: Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. 2017. Adversarial Attacks on Neural Network Policies. arXiv. https://doi.org/10.48550/ARXIV.1702.02284
– reference: Mahsa Varshosaz, Mohsen Ghaffari, Einar Broch Johnsen, and Andrzej Wasowski. 2023. Formal Specification and Testing for Reinforcement Learning (Supplementary Material). https://doi.org/10.5281/zenodo.8083298 10.5281/zenodo.8083298
– reference: Stuart J Russell and Peter Norvig. 2016. Artificial intelligence: A modern approach. Pearson Education Limited.
– reference: Avraham Ruderman, Richard Everett, Bristy Sikder, Hubert Soyer, Jonathan Uesato, Ananya Kumar, Charlie Beattie, and Pushmeet Kohli. 2019. Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis. In Safe Machine Learning workshop at ICLR 2019.
– reference: Elmira Amirloo Abolfathi, Jun Luo, Peyman Yadmellat, and Kasra Rezaee. 2021. CoachNet: An Adversarial Sampling Approach for Reinforcement Learning. In NeurIPS2019 Workshop on Safety and Robustness in Decision Making. arXiv. https://doi.org/10.48550/ARXIV.2101.02649
– reference: Amirhossein Zolfagharian, Manel Abdellatif, Lionel C. Briand, Mojtaba Bagherzadeh, and Ramesh S. 2023. A Search-Based Testing Approach for Deep Reinforcement Learning Agents. IEEE Transactions on Software Engineering, 1–22. https://doi.org/10.1109/TSE.2023.3269804 To appear 10.1109/TSE.2023.3269804
– reference: Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 355–367. https://doi.org/10.1109/ASE51524.2021.9678764 10.1109/ASE51524.2021.9678764
– reference: Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. 2017. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 3756–3762. isbn:9780999241103 https://doi.org/10.24963/ijcai.2017/525 10.24963/ijcai.2017/525
– reference: Javier García and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., 16 (2015), 1437–1480. https://dl.acm.org/doi/10.5555/2789272.2886795
– ident: e_1_2_1_5_1
  doi: 10.1145/357766.351266
– ident: e_1_2_1_28_1
  doi: 10.1145/3551349.3556962
– ident: e_1_2_1_29_1
  doi: 10.1007/978-3-030-91265-9_8
– ident: e_1_2_1_37_1
  doi: 10.1109/ASE51524.2021.9678764
– ident: e_1_2_1_18_1
  doi: 10.1145/3365365.3382216
– ident: e_1_2_1_25_1
  doi: 10.1613/jair.301
– volume-title: Safe Machine Learning workshop at ICLR
  year: 2019
  ident: e_1_2_1_39_1
– ident: e_1_2_1_58_1
  doi: 10.1109/TSE.2023.3269804
– ident: e_1_2_1_4_1
  doi: 10.1145/3468264.3468537
– ident: e_1_2_1_42_1
  doi: 10.48550/arXiv.1707.06347
– volume-title: CoachNet: An Adversarial Sampling Approach for Reinforcement Learning. In NeurIPS2019 Workshop on Safety and Robustness in Decision Making. arXiv. https://doi.org/10
  year: 2021
  ident: e_1_2_1_1_1
– volume-title: Advances in Neural Information Processing Systems
  ident: e_1_2_1_33_1
– ident: e_1_2_1_11_1
  doi: 10.1007/978-3-030-17462-0_28
– ident: e_1_2_1_14_1
  doi: 10.1111/j.2044-8317.2011.02037.x
– ident: e_1_2_1_19_1
  doi: 10.4230/LIPIcs.CONCUR.2020.3
– ident: e_1_2_1_10_1
  doi: 10.1609/aaai.v32i1.12107
– ident: e_1_2_1_12_1
  doi: 10.5555/2789272.2886795
– ident: e_1_2_1_6_1
  doi: 10.1609/aaai.v32i1.11631
– volume-title: Bayesian data analysis
  ident: e_1_2_1_13_1
– ident: e_1_2_1_52_1
  doi: 10.1145/3468891.3468897
– ident: e_1_2_1_57_1
  doi: 10.1109/ICSE43902.2021.00048
– ident: e_1_2_1_55_1
  doi: 10.1016/j.tcs.2019.05.046
– ident: e_1_2_1_43_1
  doi: 10.1145/3551349.3560429
– ident: e_1_2_1_38_1
  doi: 10.1109/ICST53961.2022.00013
– ident: e_1_2_1_56_1
  doi: 10.1109/ASE51524.2021.9678832
– ident: e_1_2_1_3_1
  doi: 10.1007/978-3-031-22337-2_29
– volume-title: Logically-Correct Reinforcement Learning. CoRR, abs/1801.08099
  year: 2018
  ident: e_1_2_1_16_1
– ident: e_1_2_1_31_1
  doi: 10.1201/9780429029608
– ident: e_1_2_1_50_1
  doi: 10.1109/ASE51524.2021.9678566
– ident: e_1_2_1_51_1
  doi: 10.1109/ADPRL.2009.4927542
– ident: e_1_2_1_15_1
  doi: 10.1109/TSE.1977.231145
– ident: e_1_2_1_30_1
  doi: 10.1145/2635868.2635920
– ident: e_1_2_1_7_1
  doi: 10.1109/C-M.1978.218136
– ident: e_1_2_1_40_1
– ident: e_1_2_1_22_1
  doi: 10.1007/978-3-031-13188-2_17
– ident: e_1_2_1_9_1
  doi: 10.48550/ARXIV.2103.03938
– ident: e_1_2_1_17_1
  doi: 10.48550/ARXIV.1702.02284
– volume-title: Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc
  ident: e_1_2_1_20_1
– ident: e_1_2_1_35_1
  doi: 10.1016/bs.adcom.2018.03.015
– volume: 1170
  volume-title: Proc. 24th International Conference on Artificial Intelligence and Statistics, Arindam Banerjee and Kenji Fukumizu (Eds.) (Proceedings of Machine Learning Research
  year: 2021
  ident: e_1_2_1_23_1
– ident: e_1_2_1_54_1
– volume-title: JAGS, and Stan
  ident: e_1_2_1_26_1
– ident: e_1_2_1_44_1
  doi: 10.1145/3238147.3238172
– ident: e_1_2_1_2_1
  doi: 10.1609/aaai.v32i1.11797
– ident: e_1_2_1_49_1
  doi: 10.1145/3510003.3510625
– ident: e_1_2_1_27_1
  doi: 10.24963/ijcai.2017/525
– ident: e_1_2_1_34_1
  doi: 10.1145/3533767.3534388
– ident: e_1_2_1_36_1
  doi: 10.1145/503272.503288
– ident: e_1_2_1_53_1
  doi: 10.5281/zenodo.8083298
– volume-title: Barto
  year: 2018
  ident: e_1_2_1_46_1
– ident: e_1_2_1_41_1
– volume-title: Advances in Neural Information Processing Systems
  ident: e_1_2_1_21_1
– ident: e_1_2_1_45_1
  doi: 10.1023/A:1022633531479
– ident: e_1_2_1_48_1
  doi: 10.1007/978-3-031-19849-6_20
– ident: e_1_2_1_24_1
  doi: 10.1007/978-3-662-49674-9_8
– ident: e_1_2_1_8_1
  doi: 10.1145/3460319.3464844
– ident: e_1_2_1_47_1
  doi: 10.24963/ijcai.2022/72
SSID ssj0001934839
Score 2.2448819
Snippet The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of...
SourceID cristin
crossref
acm
SourceType Open Access Repository
Index Database
Publisher
StartPage 125
SubjectTerms Program specifications
Software and its engineering
Software testing and debugging
Theory of computation
SubjectTermsDisplay Software and its engineering -- Software testing and debugging
Theory of computation -- Program specifications
Title Formal Specification and Testing for Reinforcement Learning
URI https://dl.acm.org/doi/10.1145/3607835
http://hdl.handle.net/10852/108921
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9tAEF6c9NJL0jotcfpgD70qtbQviZ6CqZsWXExxUt_Mah8xIZFK7VLwodB_3tmHZCU19HFZxKIVYmeY_WZ2vhmEXolCc61AAjnVOoETWiWFVtw1M9O5yvlQGZ9t8ZGfX9APczbv9X522SXr8lRtdvJK_keqMAdydSzZf5Bs-1GYgGeQL4wgYRj_SsZjBzhvQg95G4Nv_jZg5mpnxBTJT8YXR1U-DtjUU73qgtJpe4j5vI6z0cRdIcTMrVv3nSaq2QLwS_CHl_VKbgLfZ7lqrfu7pbRWBv76pF6utlQz34wrxnscCxj0qlZtNPqzh7VsVX-PnbRdquXGXHfDEhnxcdbh1nplVLAkpYH-fGp2zEXzKzpa9n40nnasaRo40fFgTkON999tPnXlMQh3F5Jse6w1V_n3Trs2BzEwstkiLtxDDzIh_E3_5EcnTFcQmvt-dO3PB-q1W_s6rnXIRt2CB6W8Ya46GKcDVmaP0EH0MvBZUJnHqGeqPjpsOnjgaNCP0JugQfiOBmHQIBw1CIPm4DsahBsNeoIuxm9no_MkttNIJKDedcK05IDvi0JYoQBpa8mIBADHhUyJHpbMUmoY1UNmCpISDjYGnAeSc2JSUUpNnqL9qq7MMcLg9pbcMlu6aoLWyFyl2jmmubIKHPZ0gPqwIYsvoWBKs8UDNIgbtKjAvrnatCxzY5HBCtxsWbvsnoRO_vzKM_Rwq4vP0f766zfzAkDjunzpxfoLgONpzw
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Formal+Specification+and+Testing+for+Reinforcement+Learning&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Varshosaz%2C+Mahsa&rft.au=Ghaffari%2C+Mohsen&rft.au=Johnsen%2C+Einar+Broch&rft.au=W%C4%85sowski%2C+Andrzej&rft.date=2023-08-30&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=7&rft.issue=ICFP&rft.spage=125&rft.epage=158&rft_id=info:doi/10.1145%2F3607835&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3607835
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon