Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study
Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is ass...
Saved in:
Published in | ACM transactions on software engineering and methodology Vol. 24; no. 4; pp. 1 - 49 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
01.08.2015
|
Subjects | |
Online Access | Get full text |
ISSN | 1049-331X 1557-7392 |
DOI | 10.1145/2699688 |
Cover
Loading…
Abstract | Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, E vo S uite . We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners. |
---|---|
AbstractList | Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, E vo S uite . We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners. Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners. |
Author | Fraser, Gordon McMinn, Phil Staats, Matt Padberg, Frank Arcuri, Andrea |
Author_xml | – sequence: 1 givenname: Gordon surname: Fraser fullname: Fraser, Gordon organization: University of Sheffield, Sheffield, UK – sequence: 2 givenname: Matt surname: Staats fullname: Staats, Matt organization: University of Luxembourg, Luxembourg – sequence: 3 givenname: Phil surname: McMinn fullname: McMinn, Phil organization: University of Sheffield, Sheffield, UK – sequence: 4 givenname: Andrea surname: Arcuri fullname: Arcuri, Andrea organization: Simula Research Laboratory, Lysaker, Norway – sequence: 5 givenname: Frank surname: Padberg fullname: Padberg, Frank organization: Karlsruhe Institute of Technology, Karlsruhe, Germany |
BookMark | eNpl0EFLwzAUB_AgE5xT_Aq56aWaNEnTnmTMuQkDwW3gyZKlrxBJm5qkyL69ddtJT-8dfv_H43-JRq1rAaEbSu4p5eIhzYoiy_MzNKZCyESyIh0NO-FFwhh9v0CXIXwSQhlJ-Rh9PDkIeNpH16gIFd62JuINhIgX0IJX0bgWv4Gydo-XYDu8dnX8Vh4OCHx4xFM8c230ztohP286441WFq9jX-2v0HmtbIDr05yg7fN8M1smq9fFy2y6SjSjNCaqkFIRSTPFeU1roQivACRN691OK8mFyHglKlYzRTLQLM8U01xwnUNNxC5jE3R3vNt599UPn5WNCRqsVS24PpRUSsIYIzQf6O2Rau9C8FCXnTeN8vuSkvK3wfLU4CCTP1KbeGgkemXsP_8DDeRzgQ |
CitedBy_id | crossref_primary_10_1007_s10664_025_10635_z crossref_primary_10_1016_j_infsof_2022_106994 crossref_primary_10_1016_j_jss_2018_03_052 crossref_primary_10_1016_j_jss_2021_110933 crossref_primary_10_1002_stvr_1660 crossref_primary_10_1109_TSE_2018_2877664 crossref_primary_10_1002_smr_2158 crossref_primary_10_1145_3695988 crossref_primary_10_1007_s11219_019_09446_5 crossref_primary_10_1109_TSE_2015_2448531 crossref_primary_10_1007_s10664_024_10451_x crossref_primary_10_1016_j_jss_2018_06_024 crossref_primary_10_1109_ACCESS_2020_3022876 crossref_primary_10_1007_s10664_019_09692_y crossref_primary_10_1145_3487569 crossref_primary_10_1002_stvr_1627 crossref_primary_10_1002_stvr_1748 crossref_primary_10_1109_ACCESS_2022_3222803 crossref_primary_10_1109_TSE_2024_3381015 crossref_primary_10_1007_s10664_019_09765_y crossref_primary_10_1109_TSE_2022_3176725 crossref_primary_10_1016_j_infsof_2018_05_001 crossref_primary_10_1007_s10664_021_10094_2 |
Cites_doi | 10.1109/SSBSE.2010.25 10.1109/ICST.2012.92 10.1109/ICST.2013.11 10.1145/2076021.2048117 10.1145/1868321.1868326 10.1002/spe.602 10.1145/2379776.2379787 10.1111/1467-9280.00272 10.1145/800027.808424 10.1002/stvr.435 10.1145/1297846.1297902 10.1109/TSE.2005.97 10.1002/stvr.1486 10.1109/ASE.2011.6100138 10.1109/TSE.2012.14 10.1109/TSE.2002.1027796 10.5555/1792786.1792798 10.1109/ICST.2010.54 10.1109/SEAA.2012.42 10.1109/32.799955 10.1145/1007512.1007528 10.1109/TSE.2006.92 10.1007/s10710-010-9112-3 10.1145/1858996.1859035 10.1145/1572272.1572280 10.1145/2025113.2025179 10.1109/ICST.2011.53 10.1145/2001420.2001445 10.1002/stvr.v14:2 10.1145/2483760.2483774 10.1016/j.jss.2010.07.026 10.5555/2337223.2337326 10.1109/ICST.2013.13 10.1145/1985793.1985820 10.1109/TSE.2011.93 10.1016/S0950-5849(01)00190-2 10.1109/TSE.2009.71 10.1145/1062455.1062530 10.1145/2338965.2336776 |
ContentType | Journal Article |
DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
DOI | 10.1145/2699688 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | CrossRef Computer and Information Systems Abstracts |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1557-7392 |
EndPage | 49 |
ExternalDocumentID | 10_1145_2699688 |
GroupedDBID | --Z -DZ -~X .4S .DC 23M 4.4 5GY 5VS 6J9 8US AAHTB AAKMM AALFJ AAYFX AAYXX ABPEJ ABPPZ ACGFO ACGOD ACM ADBCU ADL ADMLS AEBYY AEFXT AEJOY AENEX AENSD AETEA AFWIH AFWXC AIAGR AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS ARCSS ASPBG AVWKF BDXCO CCLIF CITATION CS3 D0L EBS EDO EJD FEDTE GUFHI HGAVV H~9 I07 LHSKQ P1C P2P PQQKQ RNS ROL TUS UPT YR2 ZCA 7SC 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c311t-a977a0716a44f1f5a04dee712fbbca745564d5d3f3a06ec386a3c454c8ef05b63 |
ISSN | 1049-331X |
IngestDate | Fri Jul 11 14:18:12 EDT 2025 Thu Apr 24 22:54:03 EDT 2025 Thu Jul 03 08:25:00 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c311t-a977a0716a44f1f5a04dee712fbbca745564d5d3f3a06ec386a3c454c8ef05b63 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
PQID | 1770333018 |
PQPubID | 23500 |
PageCount | 49 |
ParticipantIDs | proquest_miscellaneous_1770333018 crossref_primary_10_1145_2699688 crossref_citationtrail_10_1145_2699688 |
PublicationCentury | 2000 |
PublicationDate | 2015-08-01 |
PublicationDateYYYYMMDD | 2015-08-01 |
PublicationDate_xml | – month: 08 year: 2015 text: 2015-08-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | ACM transactions on software engineering and methodology |
PublicationYear | 2015 |
References | e_1_2_1_42_1 e_1_2_1_20_1 e_1_2_1_41_1 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_21_1 e_1_2_1_22_1 Li N. (e_1_2_1_24_1) e_1_2_1_27_1 e_1_2_1_28_1 e_1_2_1_25_1 e_1_2_1_26_1 Fraser G. (e_1_2_1_11_1) e_1_2_1_29_1 Sautter G. (e_1_2_1_33_1) e_1_2_1_7_1 e_1_2_1_31_1 e_1_2_1_8_1 e_1_2_1_30_1 e_1_2_1_5_1 e_1_2_1_6_1 e_1_2_1_3_1 e_1_2_1_12_1 e_1_2_1_35_1 e_1_2_1_4_1 e_1_2_1_13_1 e_1_2_1_34_1 e_1_2_1_1_1 e_1_2_1_10_1 e_1_2_1_2_1 e_1_2_1_32_1 e_1_2_1_16_1 e_1_2_1_39_1 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_9_1 e_1_2_1_18_1 e_1_2_1_19_1 |
References_xml | – ident: e_1_2_1_7_1 doi: 10.1109/SSBSE.2010.25 – ident: e_1_2_1_10_1 doi: 10.1109/ICST.2012.92 – ident: e_1_2_1_1_1 doi: 10.1109/ICST.2013.11 – ident: e_1_2_1_5_1 doi: 10.1145/2076021.2048117 – ident: e_1_2_1_19_1 doi: 10.1145/1868321.1868326 – ident: e_1_2_1_6_1 doi: 10.1002/spe.602 – ident: e_1_2_1_16_1 doi: 10.1145/2379776.2379787 – ident: e_1_2_1_18_1 doi: 10.1111/1467-9280.00272 – ident: e_1_2_1_26_1 doi: 10.1145/800027.808424 – ident: e_1_2_1_42_1 doi: 10.1002/stvr.435 – ident: e_1_2_1_28_1 doi: 10.1145/1297846.1297902 – ident: e_1_2_1_35_1 doi: 10.1109/TSE.2005.97 – ident: e_1_2_1_3_1 doi: 10.1002/stvr.1486 – ident: e_1_2_1_20_1 doi: 10.1109/ASE.2011.6100138 – ident: e_1_2_1_12_1 doi: 10.1109/TSE.2012.14 – ident: e_1_2_1_21_1 doi: 10.1109/TSE.2002.1027796 – ident: e_1_2_1_38_1 doi: 10.5555/1792786.1792798 – ident: e_1_2_1_4_1 doi: 10.1109/ICST.2010.54 – volume-title: Proceedings of the 24th International IEEE Symposium on Proceedings of Software Reliability Engineering (ISSRE). 380--389 ident: e_1_2_1_24_1 – ident: e_1_2_1_32_1 doi: 10.1109/SEAA.2012.42 – ident: e_1_2_1_34_1 doi: 10.1109/32.799955 – ident: e_1_2_1_39_1 doi: 10.1145/1007512.1007528 – ident: e_1_2_1_8_1 doi: 10.1109/TSE.2006.92 – volume-title: Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE). 178--188 ident: e_1_2_1_11_1 – ident: e_1_2_1_22_1 doi: 10.1007/s10710-010-9112-3 – ident: e_1_2_1_30_1 doi: 10.1145/1858996.1859035 – ident: e_1_2_1_27_1 doi: 10.1145/1572272.1572280 – volume-title: Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. 357--367 ident: e_1_2_1_33_1 – ident: e_1_2_1_9_1 doi: 10.1145/2025113.2025179 – ident: e_1_2_1_14_1 doi: 10.1109/ICST.2011.53 – ident: e_1_2_1_29_1 doi: 10.1145/2001420.2001445 – ident: e_1_2_1_25_1 doi: 10.1002/stvr.v14:2 – ident: e_1_2_1_13_1 doi: 10.1145/2483760.2483774 – ident: e_1_2_1_23_1 doi: 10.1016/j.jss.2010.07.026 – ident: e_1_2_1_36_1 doi: 10.5555/2337223.2337326 – ident: e_1_2_1_31_1 doi: 10.1109/ICST.2013.13 – ident: e_1_2_1_41_1 doi: 10.1145/1985793.1985820 – ident: e_1_2_1_15_1 doi: 10.1109/TSE.2011.93 – ident: e_1_2_1_40_1 doi: 10.1016/S0950-5849(01)00190-2 – ident: e_1_2_1_17_1 doi: 10.1109/TSE.2009.71 – ident: e_1_2_1_2_1 doi: 10.1145/1062455.1062530 – ident: e_1_2_1_37_1 doi: 10.1145/2338965.2336776 |
SSID | ssj0013024 |
Score | 2.3592875 |
Snippet | Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the... |
SourceID | proquest crossref |
SourceType | Aggregation Database Enrichment Source Index Database |
StartPage | 1 |
SubjectTerms | Automation Communities Computer programs Construction specifications Developers Hand tools Specifications Tasks |
Title | Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study |
URI | https://www.proquest.com/docview/1770333018 |
Volume | 24 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bi9NAFB5q98UX7-K6KiOoLyWayVzSPEnZi4sYH2wX9skwmUxgoduWTYroL_HneuaWNLuCl5dQQmagOV_OZc4530HoFWcJoYon8ImDGBjJsqgUtYyEEqw0FGmuiCb_LE7P2Mdzfj4a_dypWtq25Vv147d9Jf8jVbgHcjVdsv8g2W5TuAG_Qb5wBQnD9a9kfLTWzWS2bdfgdmrb9ddOFqDmPZm0Fe0X8ASX34152UzmoHO_mVKvheVHaF7TE9uWbqvVl7DD8eXmwnGGzDva2UBRe5ibeRJhuLjNMjRhP92zGtpshJtLPTixBw-5cfD4APFun_wHb1e27shbtl0NTq7yC-dZmxOfHpUmOdDVYcrdIwvCu4K5oGUhLIkotbNywAh5zcvTKKXZQDW79moPQbajZ8mOwXaUpzdNATOsGYmAgM4NDhySbV8zgl1pomvU5oVfeAvtJRCAJGO0NzvKP837DFXsBiaHv-Iass3Sd37p0NMZGnrrvSzuoTs-7MAzh6H7aKRXD9DdMNIDew3_EH01kMIdpLCBFDZowT2ksIMUNpDCAVLYQ-o9nuEeULgDFLaAeoTOTo4Xh6eRH8ERKUpIG0kIDyR4oUIyVpOay5hVWqckqctSyZRxLljFK1pTGQut6FRIqhhnaqrrmJeCPkbj1XqlnyCcphVRgoIvJCmrsgT8xJTVMRhjrjJRi330JryuQnl-ejMmZVlcE8k-wt2DG0fJcvORl-F9F6AuTQ5MrvR62xQkBRNHwapNn_55mwN0u4fvMzRur7b6OfigbfnCw-EXqOGImA |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Does+Automated+Unit+Test+Generation+Really+Help+Software+Testers%3F+A+Controlled+Empirical+Study&rft.jtitle=ACM+transactions+on+software+engineering+and+methodology&rft.au=Fraser%2C+Gordon&rft.au=Staats%2C+Matt&rft.au=McMinn%2C+Phil&rft.au=Arcuri%2C+Andrea&rft.date=2015-08-01&rft.issn=1049-331X&rft.eissn=1557-7392&rft.volume=24&rft.issue=4&rft.spage=1&rft.epage=49&rft_id=info:doi/10.1145%2F2699688&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_2699688 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1049-331X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1049-331X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1049-331X&client=summon |