Debugging a Policy: Automatic Action-Policy Testing in AI Planning

Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the International Conference on Automated Planning and Scheduling Vol. 32; pp. 353 - 361
Main Authors Steinmetz, Marcel, Fišer, Daniel, Eniser, Hasan Ferit, Ferber, Patrick, Gros, Timo P., Heim, Philippe, Höller, Daniel, Schuler, Xandra, Wüstholz, Valentin, Christakis, Maria, Hoffmann, Jörg
Format Journal Article
LanguageEnglish
Published 13.06.2022
Online AccessGet full text

Cover

Loading…
Abstract Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.
AbstractList Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.
Author Ferber, Patrick
Fišer, Daniel
Eniser, Hasan Ferit
Höller, Daniel
Hoffmann, Jörg
Steinmetz, Marcel
Christakis, Maria
Gros, Timo P.
Schuler, Xandra
Wüstholz, Valentin
Heim, Philippe
Author_xml – sequence: 1
  givenname: Marcel
  surname: Steinmetz
  fullname: Steinmetz, Marcel
– sequence: 2
  givenname: Daniel
  surname: Fišer
  fullname: Fišer, Daniel
– sequence: 3
  givenname: Hasan Ferit
  surname: Eniser
  fullname: Eniser, Hasan Ferit
– sequence: 4
  givenname: Patrick
  surname: Ferber
  fullname: Ferber, Patrick
– sequence: 5
  givenname: Timo P.
  surname: Gros
  fullname: Gros, Timo P.
– sequence: 6
  givenname: Philippe
  surname: Heim
  fullname: Heim, Philippe
– sequence: 7
  givenname: Daniel
  surname: Höller
  fullname: Höller, Daniel
– sequence: 8
  givenname: Xandra
  surname: Schuler
  fullname: Schuler, Xandra
– sequence: 9
  givenname: Valentin
  surname: Wüstholz
  fullname: Wüstholz, Valentin
– sequence: 10
  givenname: Maria
  surname: Christakis
  fullname: Christakis, Maria
– sequence: 11
  givenname: Jörg
  surname: Hoffmann
  fullname: Hoffmann, Jörg
BookMark eNo9kM1Kw0AAhBepYK19AG_7Aom72d94i9VqoWAP9bzsb1hINyWbCn17bSqeZoYZ5vDdg1nqkwfgEaMSc1Q_RauPufwmVcQlrmWFbsC8IoQWSFIy-_eE3YFlztEgSgXjNSNz8PLqzaltY2qhhru-i_b8DJvT2B_0GC1s7Bj7VFwLuPd5vCxjgs0G7jqd0m98ALdBd9kv_3QBvtZv-9VHsf1836yabWExrsZCOkedtTYwybTDVAgtpbZUcimMxVw4g31gzIjgDWOS6CoYwSWxThvCHVkAfP21Q5_z4IM6DvGgh7PCSF04qImDmjioiQP5ASnLVNI
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1609/icaps.v32i1.19820
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2334-0843
EndPage 361
ExternalDocumentID 10_1609_icaps_v32i1_19820
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
ID FETCH-LOGICAL-c112t-8dd4dcccf585ad1477a88ac48687bc167db1ef55b7feb5583a2fb7683cdab36d3
ISSN 2334-0835
IngestDate Fri Aug 23 03:35:19 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c112t-8dd4dcccf585ad1477a88ac48687bc167db1ef55b7feb5583a2fb7683cdab36d3
PageCount 9
ParticipantIDs crossref_primary_10_1609_icaps_v32i1_19820
PublicationCentury 2000
PublicationDate 2022-06-13
PublicationDateYYYYMMDD 2022-06-13
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-06-13
  day: 13
PublicationDecade 2020
PublicationTitle Proceedings of the International Conference on Automated Planning and Scheduling
PublicationYear 2022
SSID ssib044756953
Score 1.8710369
Snippet Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment...
SourceID crossref
SourceType Aggregation Database
StartPage 353
Title Debugging a Policy: Automatic Action-Policy Testing in AI Planning
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9tAEF6lcOFStWorWkq1h3IpcmBf9oabQURQCdRDkLhZ-zKKqhpEHA499Df1J3Z2196YR6XSi5XY0Tj2fJrH7sw3CH2eWK4IJy6ThrMMPJTNpHQkg-DXKGmkq5VPFM_O85ML_vVSXI5GvwdVS8tWj83PJ_tK_kercA706rtkn6HZJBROwGfQLxxBw3D8Jx2DsVhehSlDajcS_IZ1vmV7HYlYy9C0kMVLuzNPqBE7WMrTNK1oGJ1-S95s0dcO3F8xXPUH-k2G7kYQsvbCYiEo4MD6AvertHzjR2r-cGF0rO8OMi7VdUznO0dipyQRObHhPUX5jR8UHZyjWoAdmsI7TVU68EXHi3HKwPfh-gWkvn7wD1uZOcoY9xzZcW_bDc9FAqfeTnfroNHQskgx3PlsFgndH7mDPLCpAtpvFuM7RudkTCaS7q98X7_f_8AlpkJFnyKBkCqIqIKIKoh4gdYpmDZvU89-HfcWzLMn5pNAfZoeq9tKByl7j_7IIBgaRDWzV-hll47gMmLrNRq55g06TLjCCkfwHOCEKnwPVbhDFZ43uDzFPRDeoovp8ezoJOtmbWQGIu42k9Zya4ypIX1UlvCiUFIqw2UuC21IXlhNXC2ELmqnhZBM0VpDqsqMVZrllr1Da8114zYRhozV0v2JdpRDuMqFVIqqXBSGiUJRSt6jL_1DVzeRUqX663v-8Jwfb6GNFcI-orX2dum2IWZs9aegpj9JC26K
link.rule.ids 315,783,787,27936,27937
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Debugging+a+Policy%3A+Automatic+Action-Policy+Testing+in+AI+Planning&rft.jtitle=Proceedings+of+the+International+Conference+on+Automated+Planning+and+Scheduling&rft.au=Steinmetz%2C+Marcel&rft.au=Fi%C5%A1er%2C+Daniel&rft.au=Eniser%2C+Hasan+Ferit&rft.au=Ferber%2C+Patrick&rft.date=2022-06-13&rft.issn=2334-0835&rft.eissn=2334-0843&rft.volume=32&rft.spage=353&rft.epage=361&rft_id=info:doi/10.1609%2Ficaps.v32i1.19820&rft.externalDBID=n%2Fa&rft.externalDocID=10_1609_icaps_v32i1_19820
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2334-0835&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2334-0835&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2334-0835&client=summon