Debugging a Policy: Automatic Action-Policy Testing in AI Planning

Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the International Conference on Automated Planning and Scheduling Vol. 32; pp. 353 - 361
Main Authors	Steinmetz, Marcel, Fišer, Daniel, Eniser, Hasan Ferit, Ferber, Patrick, Gros, Timo P., Heim, Philippe, Höller, Daniel, Schuler, Xandra, Wüstholz, Valentin, Christakis, Maria, Hoffmann, Jörg
Format	Journal Article
Language	English
Published	13.06.2022
Online Access	Get full text

Cover

Loading…

Abstract	Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.
AbstractList	Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.
Author	Ferber, Patrick Fišer, Daniel Eniser, Hasan Ferit Höller, Daniel Hoffmann, Jörg Steinmetz, Marcel Christakis, Maria Gros, Timo P. Schuler, Xandra Wüstholz, Valentin Heim, Philippe
Author_xml	– sequence: 1 givenname: Marcel surname: Steinmetz fullname: Steinmetz, Marcel – sequence: 2 givenname: Daniel surname: Fišer fullname: Fišer, Daniel – sequence: 3 givenname: Hasan Ferit surname: Eniser fullname: Eniser, Hasan Ferit – sequence: 4 givenname: Patrick surname: Ferber fullname: Ferber, Patrick – sequence: 5 givenname: Timo P. surname: Gros fullname: Gros, Timo P. – sequence: 6 givenname: Philippe surname: Heim fullname: Heim, Philippe – sequence: 7 givenname: Daniel surname: Höller fullname: Höller, Daniel – sequence: 8 givenname: Xandra surname: Schuler fullname: Schuler, Xandra – sequence: 9 givenname: Valentin surname: Wüstholz fullname: Wüstholz, Valentin – sequence: 10 givenname: Maria surname: Christakis fullname: Christakis, Maria – sequence: 11 givenname: Jörg surname: Hoffmann fullname: Hoffmann, Jörg
BookMark	eNo9kM1Kw0AAhBepYK19AG_7Aom72d94i9VqoWAP9bzsb1hINyWbCn17bSqeZoYZ5vDdg1nqkwfgEaMSc1Q_RauPufwmVcQlrmWFbsC8IoQWSFIy-_eE3YFlztEgSgXjNSNz8PLqzaltY2qhhru-i_b8DJvT2B_0GC1s7Bj7VFwLuPd5vCxjgs0G7jqd0m98ALdBd9kv_3QBvtZv-9VHsf1836yabWExrsZCOkedtTYwybTDVAgtpbZUcimMxVw4g31gzIjgDWOS6CoYwSWxThvCHVkAfP21Q5_z4IM6DvGgh7PCSF04qImDmjioiQP5ASnLVNI
ContentType	Journal Article
DBID	AAYXX CITATION
DOI	10.1609/icaps.v32i1.19820
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
EISSN	2334-0843
EndPage	361
ExternalDocumentID	10_1609_icaps_v32i1_19820
GroupedDBID	AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION M~E
ID	FETCH-LOGICAL-c112t-8dd4dcccf585ad1477a88ac48687bc167db1ef55b7feb5583a2fb7683cdab36d3
ISSN	2334-0835
IngestDate	Fri Aug 23 03:35:19 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c112t-8dd4dcccf585ad1477a88ac48687bc167db1ef55b7feb5583a2fb7683cdab36d3
PageCount	9
ParticipantIDs	crossref_primary_10_1609_icaps_v32i1_19820
PublicationCentury	2000
PublicationDate	2022-06-13
PublicationDateYYYYMMDD	2022-06-13
PublicationDate_xml	– month: 06 year: 2022 text: 2022-06-13 day: 13
PublicationDecade	2020
PublicationTitle	Proceedings of the International Conference on Automated Planning and Scheduling
PublicationYear	2022
SSID	ssib044756953
Score	1.8710369
Snippet	Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment...
SourceID	crossref
SourceType	Aggregation Database
StartPage	353
Title	Debugging a Policy: Automatic Action-Policy Testing in AI Planning
Volume	32
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9tAEF6lcOFStWorWkq1h3IpcmBf9oabQURQCdRDkLhZ-zKKqhpEHA499Df1J3Z2196YR6XSi5XY0Tj2fJrH7sw3CH2eWK4IJy6ThrMMPJTNpHQkg-DXKGmkq5VPFM_O85ML_vVSXI5GvwdVS8tWj83PJ_tK_kercA706rtkn6HZJBROwGfQLxxBw3D8Jx2DsVhehSlDajcS_IZ1vmV7HYlYy9C0kMVLuzNPqBE7WMrTNK1oGJ1-S95s0dcO3F8xXPUH-k2G7kYQsvbCYiEo4MD6AvertHzjR2r-cGF0rO8OMi7VdUznO0dipyQRObHhPUX5jR8UHZyjWoAdmsI7TVU68EXHi3HKwPfh-gWkvn7wD1uZOcoY9xzZcW_bDc9FAqfeTnfroNHQskgx3PlsFgndH7mDPLCpAtpvFuM7RudkTCaS7q98X7_f_8AlpkJFnyKBkCqIqIKIKoh4gdYpmDZvU89-HfcWzLMn5pNAfZoeq9tKByl7j_7IIBgaRDWzV-hll47gMmLrNRq55g06TLjCCkfwHOCEKnwPVbhDFZ43uDzFPRDeoovp8ezoJOtmbWQGIu42k9Zya4ypIX1UlvCiUFIqw2UuC21IXlhNXC2ELmqnhZBM0VpDqsqMVZrllr1Da8114zYRhozV0v2JdpRDuMqFVIqqXBSGiUJRSt6jL_1DVzeRUqX663v-8Jwfb6GNFcI-orX2dum2IWZs9aegpj9JC26K
link.rule.ids	315,783,787,27936,27937
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Debugging+a+Policy%3A+Automatic+Action-Policy+Testing+in+AI+Planning&rft.jtitle=Proceedings+of+the+International+Conference+on+Automated+Planning+and+Scheduling&rft.au=Steinmetz%2C+Marcel&rft.au=Fi%C5%A1er%2C+Daniel&rft.au=Eniser%2C+Hasan+Ferit&rft.au=Ferber%2C+Patrick&rft.date=2022-06-13&rft.issn=2334-0835&rft.eissn=2334-0843&rft.volume=32&rft.spage=353&rft.epage=361&rft_id=info:doi/10.1609%2Ficaps.v32i1.19820&rft.externalDBID=n%2Fa&rft.externalDocID=10_1609_icaps_v32i1_19820
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2334-0835&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2334-0835&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2334-0835&client=summon