CURE: Code-Aware Neural Machine Translation for Automatic Program Repair
Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software bugs automatically. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix,...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
02.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software bugs automatically. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks. |
---|---|
AbstractList | Automatic program repair (APR) is crucial to improve software reliability.
Recently, neural machine translation (NMT) techniques have been used to fix
software bugs automatically. While promising, these approaches have two major
limitations. Their search space often does not contain the correct fix, and
their search strategy ignores software knowledge such as strict code syntax.
Due to these limitations, existing NMT-based techniques underperform the best
template-based approaches.
We propose CURE, a new NMT-based APR technique with three major novelties.
First, CURE pre-trains a programming language (PL) model on a large software
codebase to learn developer-like source code before the APR task. Second, CURE
designs a new code-aware search strategy that finds more correct fixes by
focusing on compilable patches and patches that are close in length to the
buggy code. Finally, CURE uses a subword tokenization technique to generate a
smaller search space that contains more correct fixes.
Our evaluation on two widely-used benchmarks shows that CURE correctly fixes
57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR
techniques on both benchmarks. Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software bugs automatically. While promising, these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks. |
Author | Tan, Lin Jiang, Nan Lutellier, Thibaud |
Author_xml | – sequence: 1 givenname: Nan surname: Jiang fullname: Jiang, Nan – sequence: 2 givenname: Thibaud surname: Lutellier fullname: Lutellier, Thibaud – sequence: 3 givenname: Lin surname: Tan fullname: Tan, Lin |
BackLink | https://doi.org/10.1109/ICSE43902.2021.00107$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.2103.00073$$DView paper in arXiv |
BookMark | eNotj11LwzAUhoMoOOd-gFcGvO5McpIm8a6UzQnzgzGvS9qk2tE2NV39-PfWzauX8_JweJ8LdNr61iF0RcmcKyHIrQnf1eecUQJzQoiEEzRhADRSnLFzNOv73VizWDIhYIJW6etmcYdTb12UfJng8JMbgqnxoyneq9bhbTBtX5t95Vtc-oCTYe-b8SzwS_BvwTR44zpThUt0Vpq6d7P_nKLtcrFNV9H6-f4hTdaREQwiWzoOShAVizy23MZQUC1KTa0WTutcchLnMM7LHYVCWa2oKe0IFFLaWEqYouvj24Nm1oWqMeEn-9PNDrojcXMkuuA_Btfvs50fQjtuyhjXgmrFCcAv_nRX8w |
ContentType | Paper Journal Article |
Copyright | 2021. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by-nc-nd/4.0 |
Copyright_xml | – notice: 2021. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://creativecommons.org/licenses/by-nc-nd/4.0 |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY GOX |
DOI | 10.48550/arxiv.2103.00073 |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv.org |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
ExternalDocumentID | 2103_00073 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY GOX |
ID | FETCH-LOGICAL-a523-dfe43850865b6d4d63c195f91d95e99b7406b3026be13c8d981afd5f9c77d6773 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:48:12 EST 2024 Thu Oct 10 15:48:31 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a523-dfe43850865b6d4d63c195f91d95e99b7406b3026be13c8d981afd5f9c77d6773 |
OpenAccessLink | https://arxiv.org/abs/2103.00073 |
PQID | 2495198403 |
PQPubID | 2050157 |
ParticipantIDs | arxiv_primary_2103_00073 proquest_journals_2495198403 |
PublicationCentury | 2000 |
PublicationDate | 20210902 |
PublicationDateYYYYMMDD | 2021-09-02 |
PublicationDate_xml | – month: 09 year: 2021 text: 20210902 day: 02 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2021 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 1.8209285 |
SecondaryResourceType | preprint |
Snippet | Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software... Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to fix software... |
SourceID | arxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Benchmarks Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Software Engineering Consumer goods Machine translation Programming languages Repair Search methods Software reliability Source code |
SummonAdditionalLinks | – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA7aInjzSatVcvAau5vHZuNFSmktQqVIC70t2TygIN263ao_3yS71YPgcbO5ZBJmvpn5-AaAO5JLk0gqUEwlQVSpGLkw7D5VxKxlmNugeDN9SSYL-rxky6bgtm1olXufGBy1LpSvkff9jGSXINOIPG7ekZ8a5burzQiNQ9COMef-Vafjp58aC064Q8ykbmYG6a6-LL9WH_cuz_HCppGflt4OS39ccYgv4xPQnsmNKU_BgVmfgaNAy1TbczAZLl5HD3BYaIMGn7I00ItpyDc4DRxIA0Ooqels0MFPONhVRRBhhbOaeAUdwpar8gLMx6P5cIKa2QdIutQQaWsoSR14SlieaKoTomLBrIi1YEaInLs4nBN3wNzERKVapLG02m1QnOuEc3IJWutibToAYiKZZqnUhGiKcSRdCmGVSK1huQNLqgs6wQLZppa3yLxxsmCcLujtjZI1T3ub_V7E1f-_r8Ex9gQQ333BPdCqyp25cRG8ym_DNX0DPr6Y4g priority: 102 providerName: ProQuest |
Title | CURE: Code-Aware Neural Machine Translation for Automatic Program Repair |
URI | https://www.proquest.com/docview/2495198403 https://arxiv.org/abs/2103.00073 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3PT8IwGP0CePFiNGpAkfTgdZG167Z6QzIgJiAhkHBb-jPhAmaAevJv91s34sF4WdLl2-Vrs_de-voK8MiUtLGMRBBGkgWR1mGAMIxD3efOcZo4n3gzncWTVfS65usGkNNZGFl8bT6qfGC1f0I9UgaQ4jJsQpPS0rI1fltXm5M-iquu_61Djulf_fm1erwYXcJFTfTIoJqZK2jY7TVMhqtF9kyGO2ODwacsLCnDMbBs6j2NlnjoqOxpBOkkGRwPOx-qSuaVkYogY5ab4gaWo2w5nAT1XQaBRKkXGGcjliIZirmKTWRipkPBnQiN4FYIlSCuKoZ6SNmQ6dSINJTOYIFOEhMnCbuF1na3tW0glElueCoNYyaitC9REjgtUme5QvKjO9D2Hcjfq7iKvGxO7pvTge6pKXm9VPd5efl0KFDnsbv_v7yHc1qaOcqdFNqF1qE42gdE44PqQTMdjXtw9pLN5ouenyB8Tr-zH4KEi8o |
link.rule.ids | 228,230,783,787,888,12779,21402,27939,33387,33758,43614,43819 |
linkProvider | Cornell University |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3PT8IwFG4UYvTmz4ii9uC1wtZ2W70YQsCpQIiBhNvStV1CYhgOUP98X7uhBxOP63bp6_K-9-Pr9xC6pak0gWSCeExSwpTyCMAwPKo2zzLuh5lTvBmOgnjKnmd8VhXcVhWtcusTnaPWubI18padkQwJMmvTh-U7sVOjbHe1GqGxi-qMAtDYm-L9x58aix-EEDHTspnppLtasviaf9xBnmOFTdt2WnrdLf1xxQ5f-oeoPpZLUxyhHbM4RnuOlqlWJyjuTl9797iba0M6n7Iw2IppyDc8dBxIgx3UlHQ2DOEn7mzWuRNhxeOSeIUhwpbz4hRN-r1JNybV7AMiITUkOjOMRhA8BTwNNNMBVZ7gmfC04EaINAQcTilsMDUeVZEWkSczDR-oMNRBGNIzVFvkC3OOsE8l1zySmlLNfL8tIYXIlIgyw1MIllQDnTsLJMtS3iKxxkmccRqouTVKUv3aq-T3IC7-f32D9uPJcJAMnkYvl-jAt2QQ24nxm6i2LjbmCtB8nV67I_sGfdybxA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CURE%3A+Code-Aware+Neural+Machine+Translation+for+Automatic+Program+Repair&rft.jtitle=arXiv.org&rft.au=Jiang%2C+Nan&rft.au=Lutellier%2C+Thibaud&rft.au=Tan%2C+Lin&rft.date=2021-09-02&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2103.00073 |