Predicting Patch Correctness Based on the Similarity of Failing Test Cases

Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We then propose BATS, an unsupervised learning-based system to p...

Full description

Saved in:

Bibliographic Details
Main Authors	Tian, Haoye, Li, Yinghua, Pian, Weiguo, Kaboré, Abdoul Kader, Liu, Kui, Habib, Andrew, Klein, Jacques, Bissyande, Tegawendé F
Format	Journal Article
Language	English
Published	28.07.2021
Subjects	Computer Science - Artificial Intelligence Computer Science - Software Engineering
Online Access	Get full text

Cover

Loading…

Abstract	Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We then propose BATS, an unsupervised learning-based system to predict patch correctness by checking patch Behaviour Against failing Test Specification. BATS exploits deep representation learning models for code and patches: for a given failing test case, the yielded embedding is used to compute similarity metrics in the search for historical similar test cases in order to identify the associated applied patches, which are then used as a proxy for assessing generated patch correctness. Experimentally, we first validate our hypothesis by assessing whether ground-truth developer patches cluster together in the same way that their associated failing test cases are clustered. Then, after collecting a large dataset of 1278 plausible patches (written by developers or generated by some 32 APR tools), we use BATS to predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall between 0.562 and 0.854 in identifying correct patches. Compared against previous work, we demonstrate that our approach outperforms state-of-the-art performance in patch correctness prediction, without the need for large labeled patch datasets in contrast with prior machine learning-based approaches. While BATS is constrained by the availability of similar test cases, we show that it can still be complementary to existing approaches: used in conjunction with a recent approach implementing supervised learning, BATS improves the overall recall in detecting correct patches. We finally show that BATS can be complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying the correct patches for APR tools.
AbstractList	Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test specifications can be drawn: similar failing test cases should require similar patches. We then propose BATS, an unsupervised learning-based system to predict patch correctness by checking patch Behaviour Against failing Test Specification. BATS exploits deep representation learning models for code and patches: for a given failing test case, the yielded embedding is used to compute similarity metrics in the search for historical similar test cases in order to identify the associated applied patches, which are then used as a proxy for assessing generated patch correctness. Experimentally, we first validate our hypothesis by assessing whether ground-truth developer patches cluster together in the same way that their associated failing test cases are clustered. Then, after collecting a large dataset of 1278 plausible patches (written by developers or generated by some 32 APR tools), we use BATS to predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall between 0.562 and 0.854 in identifying correct patches. Compared against previous work, we demonstrate that our approach outperforms state-of-the-art performance in patch correctness prediction, without the need for large labeled patch datasets in contrast with prior machine learning-based approaches. While BATS is constrained by the availability of similar test cases, we show that it can still be complementary to existing approaches: used in conjunction with a recent approach implementing supervised learning, BATS improves the overall recall in detecting correct patches. We finally show that BATS can be complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying the correct patches for APR tools.
Author	Bissyande, Tegawendé F Kaboré, Abdoul Kader Klein, Jacques Tian, Haoye Habib, Andrew Li, Yinghua Liu, Kui Pian, Weiguo
Author_xml	– sequence: 1 givenname: Haoye surname: Tian fullname: Tian, Haoye – sequence: 2 givenname: Yinghua surname: Li fullname: Li, Yinghua – sequence: 3 givenname: Weiguo surname: Pian fullname: Pian, Weiguo – sequence: 4 givenname: Abdoul Kader surname: Kaboré fullname: Kaboré, Abdoul Kader – sequence: 5 givenname: Kui surname: Liu fullname: Liu, Kui – sequence: 6 givenname: Andrew surname: Habib fullname: Habib, Andrew – sequence: 7 givenname: Jacques surname: Klein fullname: Klein, Jacques – sequence: 8 givenname: Tegawendé F surname: Bissyande fullname: Bissyande, Tegawendé F
BackLink	https://doi.org/10.48550/arXiv.2107.13296$$DView paper in arXiv
BookMark	eNotj0FOwzAURL2ABRQOwApfIMF2HNtZQkShqBKVyD76-XaopdRBtoXo7UkLq9m8Gc27JhdhDo6QO85KaeqaPUD88d-l4EyXvBKNuiJvu-isx-zDJ91Bxj1t5xgd5uBSok-QnKVzoHnv6Ic_-Amiz0c6j3QNfjqVOpcybRcu3ZDLEabkbv9zRbr1c9e-Ftv3l037uC1AaVUYQGFrO-oG3TCMTb380qgkApcGG9BSKsG5MZxZq5WrEAUo5ANb8FqKakXu_2bPMv1X9AeIx_4k1Z-lql_B2Ujw
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2107.13296
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2107_13296
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a676-8ac2d5df79cebbf955507c64ca148c9a74462118810dd76e3cc2a6c1b0ebb5423
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:39:13 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a676-8ac2d5df79cebbf955507c64ca148c9a74462118810dd76e3cc2a6c1b0ebb5423
OpenAccessLink	https://arxiv.org/abs/2107.13296
ParticipantIDs	arxiv_primary_2107_13296
PublicationCentury	2000
PublicationDate	2021-07-28
PublicationDateYYYYMMDD	2021-07-28
PublicationDate_xml	– month: 07 year: 2021 text: 2021-07-28 day: 28
PublicationDecade	2020
PublicationYear	2021
Score	1.8181942
SecondaryResourceType	preprint
Snippet	Towards predicting patch correctness in APR, we propose a simple, but novel hypothesis on how the link between the patch behaviour and failing test...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Software Engineering
Title	Predicting Patch Correctness Based on the Similarity of Failing Test Cases
URI	https://arxiv.org/abs/2107.13296
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED21nVgQCFD51A2shsR17GSEilJVAipRpG6RP2KRgQSlBfHzOSdBsLDat_gs3XtPPr8DuExirxM_0SwmRctE5iKWGceZkX6iYh2lnfH8w6Ocv4jFOlkPAH_-wujmq_zs_IHN5pr0iLoKo9DlEIach5at-6d19zjZWnH18b9xxDHbpT8gMduD3Z7d4U13HfswKKoDWCyb8BoS-otxSZXvFadhJIbdhiqDtwQjDusKiYnhc_lWktQkZoy1x5kuw2dxXFHlxinFbQ5hNbtbTeesH2HAtFSSpdpylzivMlsY47MkuIdZKawmFWIzrUiMkQJL0zhyTsliYi3X0sYmovCEmM4RjKq6KsaAQhlCVuG0CxNChDeaO0v0xRpBmO-zYxi3B8_fO5eKPOQkb3Ny8v_WKezw0KQRKcbTMxhtm4_inFB2ay7aVH8Dpzt8mw
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Predicting+Patch+Correctness+Based+on+the+Similarity+of+Failing+Test+Cases&rft.au=Tian%2C+Haoye&rft.au=Li%2C+Yinghua&rft.au=Pian%2C+Weiguo&rft.au=Kabor%C3%A9%2C+Abdoul+Kader&rft.date=2021-07-28&rft_id=info:doi/10.48550%2Farxiv.2107.13296&rft.externalDocID=2107_13296