Formal Specification and Testing for Reinforcement Learning

The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the dev...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of ACM on programming languages Vol. 7; no. ICFP; pp. 125 - 158
Main Authors	Varshosaz, Mahsa, Ghaffari, Mohsen, Johnsen, Einar Broch, Wąsowski, Andrzej
Format	Journal Article
Language	English
Published	New York, NY, USA ACM 30.08.2023
Subjects	Program specifications Software and its engineering Software testing and debugging Theory of computation specification-based testing Scala reinforcement learning
Online Access	Get full text
ISSN	2475-1421 2475-1421
DOI	10.1145/3607835

Cover

Loading…

Abstract	The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to any reinforcement learning problem modeled using our library, without any additional effort from the programmer.
AbstractList	The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to any reinforcement learning problem modeled using our library, without any additional effort from the programmer. The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of specifications between applications and increases the chances of introducing programming errors. This paper takes a step towards systematizing the development of reinforcement learning applications. We introduce a formal specification of reinforcement learning problems and algorithms, with a particular focus on temporal difference methods and their definitions in backup diagrams. We further develop a test harness for a large class of reinforcement learning applications based on temporal difference learning, including SARSA and Q-learning. The entire development is rooted in functional programming methods; starting with pure specifications and denotational semantics, ending with property-based testing and using compositional interpreters for a domain-specific term language as a test oracle for concrete implementations. We demonstrate the usefulness of this testing method on a number of examples, and evaluate with mutation testing. We show that our test suite is effective in killing mutants (90% mutants killed for 75% of subject agents). More importantly, almost half of all mutants are killed by generic write-once-use-everywhere tests that apply to any reinforcement learning problem modeled using our library, without any additional effort from the programmer.
ArticleNumber	193
Author	Ghaffari, Mohsen Wąsowski, Andrzej Varshosaz, Mahsa Johnsen, Einar Broch
Author_xml	– sequence: 1 givenname: Mahsa orcidid: 0000-0002-4776-883X surname: Varshosaz fullname: Varshosaz, Mahsa email: mahv@itu.dk organization: IT University of Copenhagen, Denmark – sequence: 2 givenname: Mohsen orcidid: 0000-0002-1939-9053 surname: Ghaffari fullname: Ghaffari, Mohsen email: mohg@itu.dk organization: IT University of Copenhagen, Denmark – sequence: 3 givenname: Einar Broch orcidid: 0000-0001-5382-3949 surname: Johnsen fullname: Johnsen, Einar Broch email: einarj@ifi.uio.no organization: University of Oslo, Norway – sequence: 4 givenname: Andrzej orcidid: 0000-0003-0532-2685 surname: Wąsowski fullname: Wąsowski, Andrzej email: wasowski@itu.dk organization: IT University of Copenhagen, Denmark
BookMark	eNpNkM1Lw0AQxRepYK3Fu6e9eYruZD-SxZMUq0JA0HoO081EVppN2eTif--GtuJl3gzvx2N4l2wW-kCMXYO4A1D6XhpRlFKfsXmuCp2BymH2b79gy2H4FkKAlaqUds4e1n3scMc_9uR86x2Ovg8cQ8M3NIw-fPG2j_ydfEjqqKMw8oowhmRdsfMWdwMtj7pgn-unzeolq96eX1ePVYa5UWOmGzRgwNqiLZwo06kl5gpMgSAbsdWtUqRVIzRZCdIQNtaCLI0kKLbYyAXjh1wX_fRTHfqINaSofJo2h4TcnpB-GCK19T76DuNPAuqpmfrYTCJvDiS67g86mb9Me1w4
Cites_doi	10.1145/357766.351266 10.1145/3551349.3556962 10.1007/978-3-030-91265-9_8 10.1109/ASE51524.2021.9678764 10.1145/3365365.3382216 10.1613/jair.301 10.1109/TSE.2023.3269804 10.1145/3468264.3468537 10.48550/arXiv.1707.06347 10.1007/978-3-030-17462-0_28 10.1111/j.2044-8317.2011.02037.x 10.4230/LIPIcs.CONCUR.2020.3 10.1609/aaai.v32i1.12107 10.5555/2789272.2886795 10.1609/aaai.v32i1.11631 10.1145/3468891.3468897 10.1109/ICSE43902.2021.00048 10.1016/j.tcs.2019.05.046 10.1145/3551349.3560429 10.1109/ICST53961.2022.00013 10.1109/ASE51524.2021.9678832 10.1007/978-3-031-22337-2_29 10.1201/9780429029608 10.1109/ASE51524.2021.9678566 10.1109/ADPRL.2009.4927542 10.1109/TSE.1977.231145 10.1145/2635868.2635920 10.1109/C-M.1978.218136 10.1007/978-3-031-13188-2_17 10.48550/ARXIV.2103.03938 10.48550/ARXIV.1702.02284 10.1016/bs.adcom.2018.03.015 10.1145/3238147.3238172 10.1609/aaai.v32i1.11797 10.1145/3510003.3510625 10.24963/ijcai.2017/525 10.1145/3533767.3534388 10.1145/503272.503288 10.5281/zenodo.8083298 10.1023/A:1022633531479 10.1007/978-3-031-19849-6_20 10.1007/978-3-662-49674-9_8 10.1145/3460319.3464844 10.24963/ijcai.2022/72
ContentType	Journal Article
Copyright	Owner/Author info:eu-repo/semantics/openAccess
Copyright_xml	– notice: Owner/Author – notice: info:eu-repo/semantics/openAccess
DBID	AAYXX CITATION 3HK
DOI	10.1145/3607835
DatabaseName	CrossRef NORA - Norwegian Open Research Archives
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2475-1421
EndPage	158
ExternalDocumentID	10852_108921 10_1145_3607835 3607835
GrantInformation_xml	– fundername: Innovation Fund Denmark grantid: DIREC funderid: http://dx.doi.org/10.13039/
GroupedDBID	AAKMM AAYFX ACM ADPZR AIKLT ALMA_UNASSIGNED_HOLDINGS GUFHI LHSKQ M~E OK1 ROL AAYXX AEFXT AEJOY AKRVB CITATION 3HK EBS
ID	FETCH-LOGICAL-a264t-5da6161997f7c085da53a24167a13d0b5f44e54d05e93136ead9913863e17bad3
ISSN	2475-1421
IngestDate	Thu Mar 28 06:48:23 EDT 2024 Thu Jul 03 08:30:21 EDT 2025 Fri Feb 21 01:13:21 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	ICFP
Keywords	specification-based testing Scala reinforcement learning
Language	English
License	This work is licensed under a Creative Commons Attribution 4.0 International License.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a264t-5da6161997f7c085da53a24167a13d0b5f44e54d05e93136ead9913863e17bad3
ORCID	0000-0001-5382-3949 0000-0003-0532-2685 0000-0002-4776-883X 0000-0002-1939-9053
OpenAccessLink	http://hdl.handle.net/10852/108921
PageCount	34
ParticipantIDs	cristin_nora_10852_108921 crossref_primary_10_1145_3607835 acm_primary_3607835
PublicationCentury	2000
PublicationDate	2023-08-30
PublicationDateYYYYMMDD	2023-08-30
PublicationDate_xml	– month: 08 year: 2023 text: 2023-08-30 day: 30
PublicationDecade	2020
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationTitle	Proceedings of ACM on programming languages
PublicationTitleAbbrev	ACM PACMPL
PublicationYear	2023
Publisher	ACM
Publisher_xml	– name: ACM
References	Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, and Pedro A. Ortega. 2021. Causal Analysis of Agent Behavior for AI Safety. arXiv. https://doi.org/10.48550/ARXIV.2103.03938 Richard S. Sutton. 1988. Learning to Predict by the Methods of Temporal Differences. Mach. Learn., 3, 1 (1988), 9–44. issn:0885-6125 https://doi.org/10.1023/A:1022633531479 10.1023/A:1022633531479 Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, and Richard S. Sutton. 2018. Multi-Step Reinforcement Learning: A Unifying Algorithm. In Proc. 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 354, 8 pages. isbn:978-1-57735-800-8 https://doi.org/10.1609/aaai.v32i1.11631 10.1609/aaai.v32i1.11631 Andrea Romdhana, Mariano Ceccato, Alessio Merlo, and Paolo Tonella. 2022. IFRIT: Focused Testing through Deep Reinforcement Learning. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 24–34. https://doi.org/10.1109/ICST53961.2022.00013 10.1109/ICST53961.2022.00013 Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proc. 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). 109–119. https://doi.org/10.1145/3238147.3238172 10.1145/3238147.3238172 Richard G. Hamlet. 1977. Testing Programs with the Aid of a Compiler. IEEE Transactions on Software Engineering, 3, 4 (1977), 279–290. https://doi.org/10.1109/TSE.1977.231145 10.1109/TSE.1977.231145 Junrui Liu, Yanju Chen, Bryan Tan, Isil Dillig, and Yu Feng. 2022. Learning Contract Invariants Using Reinforcement Learning. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering, (ASE 2022). ACM Press, 63:1–63:11. https://doi.org/10.1145/3551349.3556962 10.1145/3551349.3556962 Rajeev Alur, Suguman Bansal, Osbert Bastani, and Kishor Jothimurugan. 2022. A Framework for Transforming Specifications in Reinforcement Learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Rupak Majumdar (Eds.) (Lecture Notes in Computer Science, Vol. 13660). Springer. https://doi.org/10.1007/978-3-031-22337-2_29 10.1007/978-3-031-22337-2_29 Norman Ramsey and Avi Pfeffer. 2002. Stochastic lambda calculus and monads of probability distributions. In Proc. 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002), John Launchbury and John C. Mitchell (Eds.). ACM Press, 154–165. https://doi.org/10.1145/503272.503288 10.1145/503272.503288 G. A. Rummery and M. Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Technical Report CUED/F-INFENF/TR, https://cir.nii.ac.jp/crid/1573668924277769344 Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2020. A game-based approximate verification of deep neural networks with provable guarantees. Theor. Comput. Sci., 807 (2020), 298–329. https://doi.org/10.1016/j.tcs.2019.05.046 10.1016/j.tcs.2019.05.046 Radoslav Ivanov, Taylor J Carpenter, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. 2020. Case study: verifying the safety of an autonomous racing car with a neural network controller. In Proc. 23rd International Conference on Hybrid Systems: Computation and Control. 1–7. https://doi.org/10.1145/3365365.3382216 10.1145/3365365.3382216 Harsh Vardhan and Janos Sztipanovits. 2021. Rare Event Failure Test Case Generation in Learning-Enabled-Controllers. In 2021 6th International Conference on Machine Learning Technologies (ICMLT 2021). ACM Press, 34–40. isbn:9781450389402 https://doi.org/10.1145/3468891.3468897 10.1145/3468891.3468897 Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR, abs/1801.08099 (2018), arXiv:1801.08099. arxiv:1801.08099 Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proc. AAAI Conference on Artificial Intelligence. 32, AAAI Press. https://doi.org/10.1609/aaai.v32i1.11797 10.1609/aaai.v32i1.11797 Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 355–367. https://doi.org/10.1109/ASE51524.2021.9678764 10.1109/ASE51524.2021.9678764 Avraham Ruderman, Richard Everett, Bristy Sikder, Hubert Soyer, Jonathan Uesato, Ananya Kumar, Charlie Beattie, and Pushmeet Kohli. 2019. Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis. In Safe Machine Learning workshop at ICLR 2019. Nathan Fulton and André Platzer. 2018. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning. In Proc. Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6485–6492. https://doi.org/10.1609/aaai.v32i1.12107 10.1609/aaai.v32i1.12107 Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. 2017. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 3756–3762. isbn:9780999241103 https://doi.org/10.24963/ijcai.2017/525 10.24963/ijcai.2017/525 Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press. Uraz Cengiz Türker, Robert M. Hierons, Mohammad Reza Mousavi, and Ivan Y. Tyukin. 2021. Efficient state synchronisation in model-based testing through reinforcement learning. In Proc. 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 368–380. https://doi.org/10.1109/ASE51524.2021.9678566 10.1109/ASE51524.2021.9678566 Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, and Joost-Pieter Katoen. 2016. Safety-Constrained Reinforcement Learning for MDPs. In Proc. 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), Marsha Chechik and Jean-François Raskin (Eds.) (Lecture Notes in Computer Science, Vol. 9636). Springer, 130–146. https://doi.org/10.1007/978-3-662-49674-9_8 10.1007/978-3-662-49674-9_8 Nathan Fulton and André Platzer. 2019. Verifiably safe off-model reinforcement learning. In Proc. 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2019) (Lecture Notes in Computer Science, Vol. 11427). 413–430. https://doi.org/10.1007/978-3-030-17462-0_28 10.1007/978-3-030-17462-0_28 Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In 5th ACM SIGPLAN International Conference on Functional Programming (ICFP’00). ACM Press, 268–279. https://doi.org/10.1145/357766.351266 10.1145/357766.351266 Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. J. Artif. Intell. Res., 4 (1996), 237–285. https://doi.org/10.1613/jair.301 10.1613/jair.301 John Kruschke. 2014. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press. Jianzhong Su, Hong-Ning Dai, Lingjun Zhao, Zibin Zheng, and Xiapu Luo. 2022. Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022). ACM Press, 36:1–36:12. https://doi.org/10.1145/3551349.3560429 10.1145/3551349.3560429 Amirhossein Zolfagharian, Manel Abdellatif, Lionel C. Briand, Mojtaba Bagherzadeh, and Ramesh S. 2023. A Search-Based Testing Approach for Deep Reinforcement Learning Agents. IEEE Transactions on Software Engineering, 1–22. https://doi.org/10.1109/TSE.2023.3269804 To appear 10.1109/TSE.2023.3269804 Yuteng Lu, Weidi Sun, and Meng Sun. 2021. Mutation Testing of Reinforcement Learning Systems. In Proc. 7th International Symposium on Dependable Software Engineering: Theories, Tools, and Applications (SETTA 2021), Shengchao Qin, Jim Woodcock, and Wenhui Zhang (Eds.) (Lecture Notes in Computer Science, Vol. 13071). Springer, 143–160. isbn:978-3-030-91265-9 https://doi.org/10.1007/978-3-030-91265-9_8 10.1007/978-3-030-91265-9_8 Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey. Advances in Computers, Vol. 112. Elsevier, 275–378. issn:0065-2458 https://doi.org/10.1016/bs.adcom.2018.03.015 10.1016/bs.adcom.2018.03.015 Tuomas Oikarinen, Wang Zhang, Alexandre Megretski, Luca Daniel, and Tsui-Wei Weng. 2021. Robust Deep Reinforcement Learning through Adversarial Loss. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, 26156–26167. Saikat Dutta, Jeeva Selvam, Aryaman Jain, and Sasa Misailovic. 2021. TERA: Optimizing Stochastic Regression Tests in Machine Learning Projects. In Proc. 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). ACM Press, 413––426. isbn:9781450384599 https://doi.org/10.1145/3460319.3464844 10.1145/3460319.3464844 Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of Software En e_1_2_1_41_1 e_1_2_1_24_1 e_1_2_1_45_1 Sutton Richard S. (e_1_2_1_46_1) 2018 e_1_2_1_22_1 e_1_2_1_43_1 e_1_2_1_28_1 e_1_2_1_49_1 e_1_2_1_47_1 Abolfathi Elmira Amirloo (e_1_2_1_1_1) 2021 Oikarinen Tuomas (e_1_2_1_33_1) e_1_2_1_31_1 e_1_2_1_54_1 e_1_2_1_8_1 e_1_2_1_56_1 e_1_2_1_6_1 e_1_2_1_12_1 e_1_2_1_35_1 Ruderman Avraham (e_1_2_1_39_1) 2019 e_1_2_1_50_1 e_1_2_1_4_1 e_1_2_1_10_1 e_1_2_1_52_1 e_1_2_1_2_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_58_1 e_1_2_1_18_1 Gelman Andrew (e_1_2_1_13_1) e_1_2_1_42_1 e_1_2_1_40_1 Kruschke John (e_1_2_1_26_1) e_1_2_1_44_1 Jothimurugan Kishor (e_1_2_1_21_1) e_1_2_1_27_1 Hasanbeig Mohammadhosein (e_1_2_1_16_1) 2018 e_1_2_1_25_1 e_1_2_1_48_1 e_1_2_1_29_1 Jothimurugan Kishor (e_1_2_1_23_1) 2021; 1170 e_1_2_1_7_1 e_1_2_1_30_1 e_1_2_1_55_1 e_1_2_1_5_1 e_1_2_1_57_1 e_1_2_1_3_1 e_1_2_1_34_1 e_1_2_1_51_1 Jothimurugan Kishor (e_1_2_1_20_1) e_1_2_1_11_1 e_1_2_1_53_1 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_9_1 e_1_2_1_19_1
References_xml	– reference: John Kruschke. 2014. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press. – reference: Jianzhong Su, Hong-Ning Dai, Lingjun Zhao, Zibin Zheng, and Xiapu Luo. 2022. Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning-guided Fuzzing. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022). ACM Press, 36:1–36:12. https://doi.org/10.1145/3551349.3560429 10.1145/3551349.3560429 – reference: Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey. Advances in Computers, Vol. 112. Elsevier, 275–378. issn:0065-2458 https://doi.org/10.1016/bs.adcom.2018.03.015 10.1016/bs.adcom.2018.03.015 – reference: Tuomas Oikarinen, Wang Zhang, Alexandre Megretski, Luca Daniel, and Tsui-Wei Weng. 2021. Robust Deep Reinforcement Learning through Adversarial Loss. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, 26156–26167. – reference: Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, and Richard S. Sutton. 2018. Multi-Step Reinforcement Learning: A Unifying Algorithm. In Proc. 32nd AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 354, 8 pages. isbn:978-1-57735-800-8 https://doi.org/10.1609/aaai.v32i1.11631 10.1609/aaai.v32i1.11631 – reference: Richard G. Hamlet. 1977. Testing Programs with the Aid of a Compiler. IEEE Transactions on Software Engineering, 3, 4 (1977), 279–290. https://doi.org/10.1109/TSE.1977.231145 10.1109/TSE.1977.231145 – reference: Joymallya Chakraborty, Suvodeep Majumder, and Tim Menzies. 2021. Bias in Machine Learning Software: Why? How? What to Do? In Proc. 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). ACM Press, 429––440. isbn:9781450385626 https://doi.org/10.1145/3468264.3468537 10.1145/3468264.3468537 – reference: Richard S. Sutton. 1988. Learning to Predict by the Methods of Temporal Differences. Mach. Learn., 3, 1 (1988), 9–44. issn:0885-6125 https://doi.org/10.1023/A:1022633531479 10.1023/A:1022633531479 – reference: Martin Tappler, Filip Cano Córdoba, Bernhard K. Aichernig, and Bettina Könighofer. 2022. Search-Based Testing of Reinforcement Learning. In Proc. Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22), Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 503–510. https://doi.org/10.24963/ijcai.2022/72 10.24963/ijcai.2022/72 – reference: Norman Ramsey and Avi Pfeffer. 2002. Stochastic lambda calculus and monads of probability distributions. In Proc. 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002), John Launchbury and John C. Mitchell (Eds.). ACM Press, 154–165. https://doi.org/10.1145/503272.503288 10.1145/503272.503288 – reference: Nathan Fulton and André Platzer. 2019. Verifiably safe off-model reinforcement learning. In Proc. 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2019) (Lecture Notes in Computer Science, Vol. 11427). 413–430. https://doi.org/10.1007/978-3-030-17462-0_28 10.1007/978-3-030-17462-0_28 – reference: Sebastian Junges, Nils Jansen, Christian Dehnert, Ufuk Topcu, and Joost-Pieter Katoen. 2016. Safety-Constrained Reinforcement Learning for MDPs. In Proc. 22nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2016), Marsha Chechik and Jean-François Raskin (Eds.) (Lecture Notes in Computer Science, Vol. 9636). Springer, 130–146. https://doi.org/10.1007/978-3-662-49674-9_8 10.1007/978-3-662-49674-9_8 – reference: G. A. Rummery and M. Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Technical Report CUED/F-INFENF/TR, https://cir.nii.ac.jp/crid/1573668924277769344 – reference: Yan Zheng, Yi Liu, Xiaofei Xie, Yepang Liu, Lei Ma, Jianye Hao, and Yang Liu. 2021. Automatic Web Testing Using Curiosity-Driven Reinforcement Learning. In Proc. IEEE/ACM 43rd International Conference on Software Engineering (ICSE 2021). ACM Press, 423–435. https://doi.org/10.1109/ICSE43902.2021.00048 10.1109/ICSE43902.2021.00048 – reference: Richard McElreath. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan (2nd ed.). CRC Press. – reference: Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. 2018. Logically-Correct Reinforcement Learning. CoRR, abs/1801.08099 (2018), arXiv:1801.08099. arxiv:1801.08099 – reference: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, and Rajeev Alur. 2021. Compositional Reinforcement Learning from Logical Specifications. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). 34, Curran Associates, Inc., 10026–10039. – reference: Junrui Liu, Yanju Chen, Bryan Tan, Isil Dillig, and Yu Feng. 2022. Learning Contract Invariants Using Reinforcement Learning. In Proc. 37th IEEE/ACM International Conference on Automated Software Engineering, (ASE 2022). ACM Press, 63:1–63:11. https://doi.org/10.1145/3551349.3556962 10.1145/3551349.3556962 – reference: Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. – reference: John Schulman, Filip Wolski, Pra fulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. ArXiv, abs/1707.06347 (2017), https://doi.org/10.48550/arXiv.1707.06347 – reference: Nathan Fulton and André Platzer. 2018. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning. In Proc. Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6485–6492. https://doi.org/10.1609/aaai.v32i1.12107 10.1609/aaai.v32i1.12107 – reference: Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, and Roderick Bloem. 2020. Safe Reinforcement Learning Using Probabilistic Shields. In Proc. 31st International Conference on Concurrency Theory (CONCUR 2020), Igor Konnov and Laura Kovács (Eds.) (LIPIcs, Vol. 171). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 3:1–3:16. https://doi.org/10.4230/LIPIcs.CONCUR.2020.3 10.4230/LIPIcs.CONCUR.2020.3 – reference: Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. 1996. Reinforcement Learning: A Survey. J. Artif. Intell. Res., 4 (1996), 237–285. https://doi.org/10.1613/jair.301 10.1613/jair.301 – reference: Rajeev Alur, Suguman Bansal, Osbert Bastani, and Kishor Jothimurugan. 2022. A Framework for Transforming Specifications in Reinforcement Learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, Jean-François Raskin, Krishnendu Chatterjee, Laurent Doyen, and Rupak Majumdar (Eds.) (Lecture Notes in Computer Science, Vol. 13660). Springer. https://doi.org/10.1007/978-3-031-22337-2_29 10.1007/978-3-031-22337-2_29 – reference: Harsh Vardhan and Janos Sztipanovits. 2021. Rare Event Failure Test Case Generation in Learning-Enabled-Controllers. In 2021 6th International Conference on Machine Learning Technologies (ICMLT 2021). ACM Press, 34–40. isbn:9781450389402 https://doi.org/10.1145/3468891.3468897 10.1145/3468891.3468897 – reference: Qi Pang, Yuanyuan Yuan, and Shu Wang. 2022. MDPFuzz: Testing Models Solving Markov Decision Processes. In Proc. 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). ACM Press, 378–390. isbn:9781450393799 https://doi.org/10.1145/3533767.3534388 10.1145/3533767.3534388 – reference: Rosalia Tufano, Simone Scalabrino, Luca Pascarella, Emad Aghajani, Rocco Oliveto, and Gabriele Bavota. 2022. Using Reinforcement Learning for Load Testing of Video Games. In Proc. IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022). ACM Press. https://doi.org/10.1145/3510003.3510625 10.1145/3510003.3510625 – reference: Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 11, 4 (1978), 34–41. https://doi.org/doi: 10.1109/C-M.1978.218136 – reference: Min Wu, Matthew Wicker, Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2020. A game-based approximate verification of deep neural networks with provable guarantees. Theor. Comput. Sci., 807 (2020), 298–329. https://doi.org/10.1016/j.tcs.2019.05.046 10.1016/j.tcs.2019.05.046 – reference: Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proc. 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). 109–119. https://doi.org/10.1145/3238147.3238172 10.1145/3238147.3238172 – reference: Martin Tappler, Stefan Pranger, Bettina Könighofer, Edi Muskardin, Roderick Bloem, and Kim G. Larsen. 2022. Automata Learning Meets Shielding. In Proc. 11th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles (ISoLA 2022), Tiziana Margaria and Bernhard Steffen (Eds.) (Lecture Notes in Computer Science, Vol. 13701). Springer, 335–359. https://doi.org/10.1007/978-3-031-19849-6_20 10.1007/978-3-031-19849-6_20 – reference: Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proc. AAAI Conference on Artificial Intelligence. 32, AAAI Press. https://doi.org/10.1609/aaai.v32i1.11797 10.1609/aaai.v32i1.11797 – reference: Kishor Jothimurugan, Osbert Bastani, and Rajeev Alur. 2021. Abstract Value Iteration for Hierarchical Reinforcement Learning. In Proc. 24th International Conference on Artificial Intelligence and Statistics, Arindam Banerjee and Kenji Fukumizu (Eds.) (Proceedings of Machine Learning Research, Vol. 130). PMLR, 1162–1170. – reference: Andrew Gelman and Cosma Rohilla Shalizi. 2013. Philosophy and the practice of Bayesian statistics. Brit. J. Math. Statist. Psych., 66, 1 (2013), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x 10.1111/j.2044-8317.2011.02037.x – reference: Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press. – reference: Andrew W. Moore. 1990. Efficient memory-based learning for robot control. Ph.D. Thesis, University of Cambridge. – reference: Harm Van Seijen, Hado Van Hasselt, Shimon Whiteson, and Marco Wiering. 2009. A theoretical and empirical analysis of Expected SARSA. In Proc. Symposium on Adaptive Dynamic Programming and Reinforcement Learning. 177–184. https://doi.org/10.1109/ADPRL.2009.4927542 10.1109/ADPRL.2009.4927542 – reference: Yuteng Lu, Weidi Sun, and Meng Sun. 2021. Mutation Testing of Reinforcement Learning Systems. In Proc. 7th International Symposium on Dependable Software Engineering: Theories, Tools, and Applications (SETTA 2021), Shengchao Qin, Jim Woodcock, and Wenhui Zhang (Eds.) (Lecture Notes in Computer Science, Vol. 13071). Springer, 143–160. isbn:978-3-030-91265-9 https://doi.org/10.1007/978-3-030-91265-9_8 10.1007/978-3-030-91265-9_8 – reference: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, and Rajeev Alur. 2022. Specification-Guided Learning of Nash Equilibria with High Social Welfare. In Proc. 34th International Conference on Computer Aided Verification (CAV 2022), Sharon Shoham and Yakir Vizel (Eds.) (Lecture Notes in Computer Science, Vol. 13372). Springer, 343–363. https://doi.org/10.1007/978-3-031-13188-2_17 10.1007/978-3-031-13188-2_17 – reference: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, and Pedro A. Ortega. 2021. Causal Analysis of Agent Behavior for AI Safety. arXiv. https://doi.org/10.48550/ARXIV.2103.03938 – reference: Andrea Romdhana, Mariano Ceccato, Alessio Merlo, and Paolo Tonella. 2022. IFRIT: Focused Testing through Deep Reinforcement Learning. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 24–34. https://doi.org/10.1109/ICST53961.2022.00013 10.1109/ICST53961.2022.00013 – reference: Uraz Cengiz Türker, Robert M. Hierons, Mohammad Reza Mousavi, and Ivan Y. Tyukin. 2021. Efficient state synchronisation in model-based testing through reinforcement learning. In Proc. 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 368–380. https://doi.org/10.1109/ASE51524.2021.9678566 10.1109/ASE51524.2021.9678566 – reference: Saikat Dutta, Jeeva Selvam, Aryaman Jain, and Sasa Misailovic. 2021. TERA: Optimizing Stochastic Regression Tests in Machine Learning Projects. In Proc. 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). ACM Press, 413––426. isbn:9781450384599 https://doi.org/10.1145/3460319.3464844 10.1145/3460319.3464844 – reference: Kishor Jothimurugan, Rajeev Alur, and Osbert Bastani. 2019. A Composable Specification Language for Reinforcement Learning Tasks. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). 32, Curran Associates, Inc.. – reference: Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In 5th ACM SIGPLAN International Conference on Functional Programming (ICFP’00). ACM Press, 268–279. https://doi.org/10.1145/357766.351266 10.1145/357766.351266 – reference: Radoslav Ivanov, Taylor J Carpenter, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. 2020. Case study: verifying the safety of an autonomous racing car with a neural network controller. In Proc. 23rd International Conference on Hybrid Systems: Computation and Control. 1–7. https://doi.org/10.1145/3365365.3382216 10.1145/3365365.3382216 – reference: Shaohua Zhang, Shuang Liu, Jun Sun, Yuqi Chen, Wenzhi Huang, Jinyi Liu, Jian Liu, and Jianye Hao. 2021. FIGCPS: Effective Failure-inducing Input Generation for Cyber-Physical Systems with Deep Reinforcement Learning. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 555–567. https://doi.org/10.1109/ASE51524.2021.9678832 10.1109/ASE51524.2021.9678832 – reference: Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE-22), Shing-Chi Cheung, Alessandro Orso, and Margaret-Anne D. Storey (Eds.). ACM Press, 643–653. https://doi.org/10.1145/2635868.2635920 10.1145/2635868.2635920 – reference: Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. – reference: Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. 2017. Adversarial Attacks on Neural Network Policies. arXiv. https://doi.org/10.48550/ARXIV.1702.02284 – reference: Mahsa Varshosaz, Mohsen Ghaffari, Einar Broch Johnsen, and Andrzej Wasowski. 2023. Formal Specification and Testing for Reinforcement Learning (Supplementary Material). https://doi.org/10.5281/zenodo.8083298 10.5281/zenodo.8083298 – reference: Stuart J Russell and Peter Norvig. 2016. Artificial intelligence: A modern approach. Pearson Education Limited. – reference: Avraham Ruderman, Richard Everett, Bristy Sikder, Hubert Soyer, Jonathan Uesato, Ananya Kumar, Charlie Beattie, and Pushmeet Kohli. 2019. Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis. In Safe Machine Learning workshop at ICLR 2019. – reference: Elmira Amirloo Abolfathi, Jun Luo, Peyman Yadmellat, and Kasra Rezaee. 2021. CoachNet: An Adversarial Sampling Approach for Reinforcement Learning. In NeurIPS2019 Workshop on Safety and Robustness in Decision Making. arXiv. https://doi.org/10.48550/ARXIV.2101.02649 – reference: Amirhossein Zolfagharian, Manel Abdellatif, Lionel C. Briand, Mojtaba Bagherzadeh, and Ramesh S. 2023. A Search-Based Testing Approach for Deep Reinforcement Learning Agents. IEEE Transactions on Software Engineering, 1–22. https://doi.org/10.1109/TSE.2023.3269804 To appear 10.1109/TSE.2023.3269804 – reference: Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score. In 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). 355–367. https://doi.org/10.1109/ASE51524.2021.9678764 10.1109/ASE51524.2021.9678764 – reference: Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, and Min Sun. 2017. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 3756–3762. isbn:9780999241103 https://doi.org/10.24963/ijcai.2017/525 10.24963/ijcai.2017/525 – reference: Javier García and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res., 16 (2015), 1437–1480. https://dl.acm.org/doi/10.5555/2789272.2886795 – ident: e_1_2_1_5_1 doi: 10.1145/357766.351266 – ident: e_1_2_1_28_1 doi: 10.1145/3551349.3556962 – ident: e_1_2_1_29_1 doi: 10.1007/978-3-030-91265-9_8 – ident: e_1_2_1_37_1 doi: 10.1109/ASE51524.2021.9678764 – ident: e_1_2_1_18_1 doi: 10.1145/3365365.3382216 – ident: e_1_2_1_25_1 doi: 10.1613/jair.301 – volume-title: Safe Machine Learning workshop at ICLR year: 2019 ident: e_1_2_1_39_1 – ident: e_1_2_1_58_1 doi: 10.1109/TSE.2023.3269804 – ident: e_1_2_1_4_1 doi: 10.1145/3468264.3468537 – ident: e_1_2_1_42_1 doi: 10.48550/arXiv.1707.06347 – volume-title: CoachNet: An Adversarial Sampling Approach for Reinforcement Learning. In NeurIPS2019 Workshop on Safety and Robustness in Decision Making. arXiv. https://doi.org/10 year: 2021 ident: e_1_2_1_1_1 – volume-title: Advances in Neural Information Processing Systems ident: e_1_2_1_33_1 – ident: e_1_2_1_11_1 doi: 10.1007/978-3-030-17462-0_28 – ident: e_1_2_1_14_1 doi: 10.1111/j.2044-8317.2011.02037.x – ident: e_1_2_1_19_1 doi: 10.4230/LIPIcs.CONCUR.2020.3 – ident: e_1_2_1_10_1 doi: 10.1609/aaai.v32i1.12107 – ident: e_1_2_1_12_1 doi: 10.5555/2789272.2886795 – ident: e_1_2_1_6_1 doi: 10.1609/aaai.v32i1.11631 – volume-title: Bayesian data analysis ident: e_1_2_1_13_1 – ident: e_1_2_1_52_1 doi: 10.1145/3468891.3468897 – ident: e_1_2_1_57_1 doi: 10.1109/ICSE43902.2021.00048 – ident: e_1_2_1_55_1 doi: 10.1016/j.tcs.2019.05.046 – ident: e_1_2_1_43_1 doi: 10.1145/3551349.3560429 – ident: e_1_2_1_38_1 doi: 10.1109/ICST53961.2022.00013 – ident: e_1_2_1_56_1 doi: 10.1109/ASE51524.2021.9678832 – ident: e_1_2_1_3_1 doi: 10.1007/978-3-031-22337-2_29 – volume-title: Logically-Correct Reinforcement Learning. CoRR, abs/1801.08099 year: 2018 ident: e_1_2_1_16_1 – ident: e_1_2_1_31_1 doi: 10.1201/9780429029608 – ident: e_1_2_1_50_1 doi: 10.1109/ASE51524.2021.9678566 – ident: e_1_2_1_51_1 doi: 10.1109/ADPRL.2009.4927542 – ident: e_1_2_1_15_1 doi: 10.1109/TSE.1977.231145 – ident: e_1_2_1_30_1 doi: 10.1145/2635868.2635920 – ident: e_1_2_1_7_1 doi: 10.1109/C-M.1978.218136 – ident: e_1_2_1_40_1 – ident: e_1_2_1_22_1 doi: 10.1007/978-3-031-13188-2_17 – ident: e_1_2_1_9_1 doi: 10.48550/ARXIV.2103.03938 – ident: e_1_2_1_17_1 doi: 10.48550/ARXIV.1702.02284 – volume-title: Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc ident: e_1_2_1_20_1 – ident: e_1_2_1_35_1 doi: 10.1016/bs.adcom.2018.03.015 – volume: 1170 volume-title: Proc. 24th International Conference on Artificial Intelligence and Statistics, Arindam Banerjee and Kenji Fukumizu (Eds.) (Proceedings of Machine Learning Research year: 2021 ident: e_1_2_1_23_1 – ident: e_1_2_1_54_1 – volume-title: JAGS, and Stan ident: e_1_2_1_26_1 – ident: e_1_2_1_44_1 doi: 10.1145/3238147.3238172 – ident: e_1_2_1_2_1 doi: 10.1609/aaai.v32i1.11797 – ident: e_1_2_1_49_1 doi: 10.1145/3510003.3510625 – ident: e_1_2_1_27_1 doi: 10.24963/ijcai.2017/525 – ident: e_1_2_1_34_1 doi: 10.1145/3533767.3534388 – ident: e_1_2_1_36_1 doi: 10.1145/503272.503288 – ident: e_1_2_1_53_1 doi: 10.5281/zenodo.8083298 – volume-title: Barto year: 2018 ident: e_1_2_1_46_1 – ident: e_1_2_1_41_1 – volume-title: Advances in Neural Information Processing Systems ident: e_1_2_1_21_1 – ident: e_1_2_1_45_1 doi: 10.1023/A:1022633531479 – ident: e_1_2_1_48_1 doi: 10.1007/978-3-031-19849-6_20 – ident: e_1_2_1_24_1 doi: 10.1007/978-3-662-49674-9_8 – ident: e_1_2_1_8_1 doi: 10.1145/3460319.3464844 – ident: e_1_2_1_47_1 doi: 10.24963/ijcai.2022/72
SSID	ssj0001934839
Score	2.2448819
Snippet	The development process for reinforcement learning applications is still exploratory rather than systematic. This exploratory nature reduces reuse of...
SourceID	cristin crossref acm
SourceType	Open Access Repository Index Database Publisher
StartPage	125
SubjectTerms	Program specifications Software and its engineering Software testing and debugging Theory of computation
SubjectTermsDisplay	Software and its engineering -- Software testing and debugging Theory of computation -- Program specifications
Title	Formal Specification and Testing for Reinforcement Learning
URI	https://dl.acm.org/doi/10.1145/3607835 http://hdl.handle.net/10852/108921
Volume	7
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9tAEF6c9NJL0jotcfpgD70qtbQviZ6CqZsWXExxUt_Mah8xIZFK7VLwodB_3tmHZCU19HFZxKIVYmeY_WZ2vhmEXolCc61AAjnVOoETWiWFVtw1M9O5yvlQGZ9t8ZGfX9APczbv9X522SXr8lRtdvJK_keqMAdydSzZf5Bs-1GYgGeQL4wgYRj_SsZjBzhvQg95G4Nv_jZg5mpnxBTJT8YXR1U-DtjUU73qgtJpe4j5vI6z0cRdIcTMrVv3nSaq2QLwS_CHl_VKbgLfZ7lqrfu7pbRWBv76pF6utlQz34wrxnscCxj0qlZtNPqzh7VsVX-PnbRdquXGXHfDEhnxcdbh1nplVLAkpYH-fGp2zEXzKzpa9n40nnasaRo40fFgTkON999tPnXlMQh3F5Jse6w1V_n3Trs2BzEwstkiLtxDDzIh_E3_5EcnTFcQmvt-dO3PB-q1W_s6rnXIRt2CB6W8Ya46GKcDVmaP0EH0MvBZUJnHqGeqPjpsOnjgaNCP0JugQfiOBmHQIBw1CIPm4DsahBsNeoIuxm9no_MkttNIJKDedcK05IDvi0JYoQBpa8mIBADHhUyJHpbMUmoY1UNmCpISDjYGnAeSc2JSUUpNnqL9qq7MMcLg9pbcMlu6aoLWyFyl2jmmubIKHPZ0gPqwIYsvoWBKs8UDNIgbtKjAvrnatCxzY5HBCtxsWbvsnoRO_vzKM_Rwq4vP0f766zfzAkDjunzpxfoLgONpzw
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Formal+Specification+and+Testing+for+Reinforcement+Learning&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Varshosaz%2C+Mahsa&rft.au=Ghaffari%2C+Mohsen&rft.au=Johnsen%2C+Einar+Broch&rft.au=W%C4%85sowski%2C+Andrzej&rft.date=2023-08-30&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=7&rft.issue=ICFP&rft.spage=125&rft.epage=158&rft_id=info:doi/10.1145%2F3607835&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3607835
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon