The impacts of techniques, programs and tests on automated program repair: An empirical study
•An extensive study on the impacts of several factors on program repair is proposed.•Performance of the techniques declines with the increase of program size.•Adding more passed tests cannot impact the real repair effectiveness.•Adding more failed tests is helpful to some extent for the deterministi...
Saved in:
Published in | The Journal of systems and software Vol. 137; pp. 480 - 496 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.03.2018
|
Subjects | |
Online Access | Get full text |
ISSN | 0164-1212 1873-1228 |
DOI | 10.1016/j.jss.2017.06.039 |
Cover
Loading…
Summary: | •An extensive study on the impacts of several factors on program repair is proposed.•Performance of the techniques declines with the increase of program size.•Adding more passed tests cannot impact the real repair effectiveness.•Adding more failed tests is helpful to some extent for the deterministic techniques.•Four techniques find more than 80% of patches within the first 50% of search space.
Manual program repair is notoriously tedious, error-prone, and costly, especially for the modern large-scale projects. Automated program repair can automatically find program patches without much human intervention, greatly reducing the burden of developers as well as accelerating software delivery. Therefore, much research effort has been dedicated to design powerful program repair techniques. To date, although various program repair techniques have been proposed, to our knowledge, there lacks extensive study on the impacts of repair techniques, subject programs, and test suites on the repair effectiveness and efficiency. In this paper, we perform such an extensive study on repairing 180 seeded and real faults from 17 small to large sized programs. We study the impacts of five representative automated program repair techniques, including GenProg, RSRepair, Brute-force-based technique, AE and Kali, on the repair results. We further investigate the impacts of different subject programs and test suites on effectiveness and efficiency of program repair techniques. Our study demonstrates a number of interesting findings: Brute-force-based technique generates the maximum number of patches but is also the most costly technique, while Kali is the most efficient and has medium effectiveness among the studied techniques; techniques that work well with small programs become too costly or ineffective when applied to large sized programs; since tool-reported patches may overfit the selected test cases, we calculate the false positive rates and find that the influence of failed test cases is much larger than that of passed test cases; finally, surprisingly, all the studied techniques except RSRepair can find more than 80% of successful patches within the first 50% of search space. |
---|---|
ISSN: | 0164-1212 1873-1228 |
DOI: | 10.1016/j.jss.2017.06.039 |