Predicting Patch Correctness Based on the Similarity of Failing Test Cases
Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We then propose BATS, an unsupervised learning-based system to p...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.07.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Towards predicting patch correctness in APR, we propose a simple, but novel
hypothesis on how the link between the patch behaviour and failing test
specifications can be drawn: similar failing test cases should require similar
patches. We then propose BATS, an unsupervised learning-based system to predict
patch correctness by checking patch Behaviour Against failing Test
Specification. BATS exploits deep representation learning models for code and
patches: for a given failing test case, the yielded embedding is used to
compute similarity metrics in the search for historical similar test cases in
order to identify the associated applied patches, which are then used as a
proxy for assessing generated patch correctness. Experimentally, we first
validate our hypothesis by assessing whether ground-truth developer patches
cluster together in the same way that their associated failing test cases are
clustered. Then, after collecting a large dataset of 1278 plausible patches
(written by developers or generated by some 32 APR tools), we use BATS to
predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall
between 0.562 and 0.854 in identifying correct patches. Compared against
previous work, we demonstrate that our approach outperforms state-of-the-art
performance in patch correctness prediction, without the need for large labeled
patch datasets in contrast with prior machine learning-based approaches. While
BATS is constrained by the availability of similar test cases, we show that it
can still be complementary to existing approaches: used in conjunction with a
recent approach implementing supervised learning, BATS improves the overall
recall in detecting correct patches. We finally show that BATS can be
complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying
the correct patches for APR tools. |
---|---|
AbstractList | Towards predicting patch correctness in APR, we propose a simple, but novel
hypothesis on how the link between the patch behaviour and failing test
specifications can be drawn: similar failing test cases should require similar
patches. We then propose BATS, an unsupervised learning-based system to predict
patch correctness by checking patch Behaviour Against failing Test
Specification. BATS exploits deep representation learning models for code and
patches: for a given failing test case, the yielded embedding is used to
compute similarity metrics in the search for historical similar test cases in
order to identify the associated applied patches, which are then used as a
proxy for assessing generated patch correctness. Experimentally, we first
validate our hypothesis by assessing whether ground-truth developer patches
cluster together in the same way that their associated failing test cases are
clustered. Then, after collecting a large dataset of 1278 plausible patches
(written by developers or generated by some 32 APR tools), we use BATS to
predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall
between 0.562 and 0.854 in identifying correct patches. Compared against
previous work, we demonstrate that our approach outperforms state-of-the-art
performance in patch correctness prediction, without the need for large labeled
patch datasets in contrast with prior machine learning-based approaches. While
BATS is constrained by the availability of similar test cases, we show that it
can still be complementary to existing approaches: used in conjunction with a
recent approach implementing supervised learning, BATS improves the overall
recall in detecting correct patches. We finally show that BATS can be
complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying
the correct patches for APR tools. |
Author | Bissyande, Tegawendé F Kaboré, Abdoul Kader Klein, Jacques Tian, Haoye Habib, Andrew Li, Yinghua Liu, Kui Pian, Weiguo |
Author_xml | – sequence: 1 givenname: Haoye surname: Tian fullname: Tian, Haoye – sequence: 2 givenname: Yinghua surname: Li fullname: Li, Yinghua – sequence: 3 givenname: Weiguo surname: Pian fullname: Pian, Weiguo – sequence: 4 givenname: Abdoul Kader surname: Kaboré fullname: Kaboré, Abdoul Kader – sequence: 5 givenname: Kui surname: Liu fullname: Liu, Kui – sequence: 6 givenname: Andrew surname: Habib fullname: Habib, Andrew – sequence: 7 givenname: Jacques surname: Klein fullname: Klein, Jacques – sequence: 8 givenname: Tegawendé F surname: Bissyande fullname: Bissyande, Tegawendé F |
BackLink | https://doi.org/10.48550/arXiv.2107.13296$$DView paper in arXiv |
BookMark | eNotj0FOwzAURL2ABRQOwApfIMF2HNtZQkShqBKVyD76-XaopdRBtoXo7UkLq9m8Gc27JhdhDo6QO85KaeqaPUD88d-l4EyXvBKNuiJvu-isx-zDJ91Bxj1t5xgd5uBSok-QnKVzoHnv6Ic_-Amiz0c6j3QNfjqVOpcybRcu3ZDLEabkbv9zRbr1c9e-Ftv3l037uC1AaVUYQGFrO-oG3TCMTb380qgkApcGG9BSKsG5MZxZq5WrEAUo5ANb8FqKakXu_2bPMv1X9AeIx_4k1Z-lql_B2Ujw |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2107.13296 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2107_13296 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a676-8ac2d5df79cebbf955507c64ca148c9a74462118810dd76e3cc2a6c1b0ebb5423 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:39:13 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a676-8ac2d5df79cebbf955507c64ca148c9a74462118810dd76e3cc2a6c1b0ebb5423 |
OpenAccessLink | https://arxiv.org/abs/2107.13296 |
ParticipantIDs | arxiv_primary_2107_13296 |
PublicationCentury | 2000 |
PublicationDate | 2021-07-28 |
PublicationDateYYYYMMDD | 2021-07-28 |
PublicationDate_xml | – month: 07 year: 2021 text: 2021-07-28 day: 28 |
PublicationDecade | 2020 |
PublicationYear | 2021 |
Score | 1.8181942 |
SecondaryResourceType | preprint |
Snippet | Towards predicting patch correctness in APR, we propose a simple, but novel
hypothesis on how the link between the patch behaviour and failing test... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Software Engineering |
Title | Predicting Patch Correctness Based on the Similarity of Failing Test Cases |
URI | https://arxiv.org/abs/2107.13296 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED21nVgQCFD51A2shsR17GSEilJVAipRpG6RP2KRgQSlBfHzOSdBsLDat_gs3XtPPr8DuExirxM_0SwmRctE5iKWGceZkX6iYh2lnfH8w6Ocv4jFOlkPAH_-wujmq_zs_IHN5pr0iLoKo9DlEIach5at-6d19zjZWnH18b9xxDHbpT8gMduD3Z7d4U13HfswKKoDWCyb8BoS-otxSZXvFadhJIbdhiqDtwQjDusKiYnhc_lWktQkZoy1x5kuw2dxXFHlxinFbQ5hNbtbTeesH2HAtFSSpdpylzivMlsY47MkuIdZKawmFWIzrUiMkQJL0zhyTsliYi3X0sYmovCEmM4RjKq6KsaAQhlCVuG0CxNChDeaO0v0xRpBmO-zYxi3B8_fO5eKPOQkb3Ny8v_WKezw0KQRKcbTMxhtm4_inFB2ay7aVH8Dpzt8mw |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Predicting+Patch+Correctness+Based+on+the+Similarity+of+Failing+Test+Cases&rft.au=Tian%2C+Haoye&rft.au=Li%2C+Yinghua&rft.au=Pian%2C+Weiguo&rft.au=Kabor%C3%A9%2C+Abdoul+Kader&rft.date=2021-07-28&rft_id=info:doi/10.48550%2Farxiv.2107.13296&rft.externalDocID=2107_13296 |