Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study

Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is ass...

Full description

Saved in:

Bibliographic Details
Published in	ACM transactions on software engineering and methodology Vol. 24; no. 4; pp. 1 - 49
Main Authors	Fraser, Gordon, Staats, Matt, McMinn, Phil, Arcuri, Andrea, Padberg, Frank
Format	Journal Article
Language	English
Published	01.08.2015
Subjects	Automation Communities Computer programs Construction specifications Developers Hand tools Specifications Tasks
Online Access	Get full text
ISSN	1049-331X 1557-7392
DOI	10.1145/2699688

Cover

Loading…

Abstract	Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, E vo S uite . We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.
AbstractList	Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, E vo S uite . We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners. Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.
Author	Fraser, Gordon McMinn, Phil Staats, Matt Padberg, Frank Arcuri, Andrea
Author_xml	– sequence: 1 givenname: Gordon surname: Fraser fullname: Fraser, Gordon organization: University of Sheffield, Sheffield, UK – sequence: 2 givenname: Matt surname: Staats fullname: Staats, Matt organization: University of Luxembourg, Luxembourg – sequence: 3 givenname: Phil surname: McMinn fullname: McMinn, Phil organization: University of Sheffield, Sheffield, UK – sequence: 4 givenname: Andrea surname: Arcuri fullname: Arcuri, Andrea organization: Simula Research Laboratory, Lysaker, Norway – sequence: 5 givenname: Frank surname: Padberg fullname: Padberg, Frank organization: Karlsruhe Institute of Technology, Karlsruhe, Germany
BookMark	eNpl0EFLwzAUB_AgE5xT_Aq56aWaNEnTnmTMuQkDwW3gyZKlrxBJm5qkyL69ddtJT-8dfv_H43-JRq1rAaEbSu4p5eIhzYoiy_MzNKZCyESyIh0NO-FFwhh9v0CXIXwSQhlJ-Rh9PDkIeNpH16gIFd62JuINhIgX0IJX0bgWv4Gydo-XYDu8dnX8Vh4OCHx4xFM8c230ztohP286441WFq9jX-2v0HmtbIDr05yg7fN8M1smq9fFy2y6SjSjNCaqkFIRSTPFeU1roQivACRN691OK8mFyHglKlYzRTLQLM8U01xwnUNNxC5jE3R3vNt599UPn5WNCRqsVS24PpRUSsIYIzQf6O2Rau9C8FCXnTeN8vuSkvK3wfLU4CCTP1KbeGgkemXsP_8DDeRzgQ
CitedBy_id	crossref_primary_10_1007_s10664_025_10635_z crossref_primary_10_1016_j_infsof_2022_106994 crossref_primary_10_1016_j_jss_2018_03_052 crossref_primary_10_1016_j_jss_2021_110933 crossref_primary_10_1002_stvr_1660 crossref_primary_10_1109_TSE_2018_2877664 crossref_primary_10_1002_smr_2158 crossref_primary_10_1145_3695988 crossref_primary_10_1007_s11219_019_09446_5 crossref_primary_10_1109_TSE_2015_2448531 crossref_primary_10_1007_s10664_024_10451_x crossref_primary_10_1016_j_jss_2018_06_024 crossref_primary_10_1109_ACCESS_2020_3022876 crossref_primary_10_1007_s10664_019_09692_y crossref_primary_10_1145_3487569 crossref_primary_10_1002_stvr_1627 crossref_primary_10_1002_stvr_1748 crossref_primary_10_1109_ACCESS_2022_3222803 crossref_primary_10_1109_TSE_2024_3381015 crossref_primary_10_1007_s10664_019_09765_y crossref_primary_10_1109_TSE_2022_3176725 crossref_primary_10_1016_j_infsof_2018_05_001 crossref_primary_10_1007_s10664_021_10094_2
Cites_doi	10.1109/SSBSE.2010.25 10.1109/ICST.2012.92 10.1109/ICST.2013.11 10.1145/2076021.2048117 10.1145/1868321.1868326 10.1002/spe.602 10.1145/2379776.2379787 10.1111/1467-9280.00272 10.1145/800027.808424 10.1002/stvr.435 10.1145/1297846.1297902 10.1109/TSE.2005.97 10.1002/stvr.1486 10.1109/ASE.2011.6100138 10.1109/TSE.2012.14 10.1109/TSE.2002.1027796 10.5555/1792786.1792798 10.1109/ICST.2010.54 10.1109/SEAA.2012.42 10.1109/32.799955 10.1145/1007512.1007528 10.1109/TSE.2006.92 10.1007/s10710-010-9112-3 10.1145/1858996.1859035 10.1145/1572272.1572280 10.1145/2025113.2025179 10.1109/ICST.2011.53 10.1145/2001420.2001445 10.1002/stvr.v14:2 10.1145/2483760.2483774 10.1016/j.jss.2010.07.026 10.5555/2337223.2337326 10.1109/ICST.2013.13 10.1145/1985793.1985820 10.1109/TSE.2011.93 10.1016/S0950-5849(01)00190-2 10.1109/TSE.2009.71 10.1145/1062455.1062530 10.1145/2338965.2336776
ContentType	Journal Article
DBID	AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D
DOI	10.1145/2699688
DatabaseName	CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	CrossRef Computer and Information Systems Abstracts
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1557-7392
EndPage	49
ExternalDocumentID	10_1145_2699688
GroupedDBID	--Z -DZ -~X .4S .DC 23M 4.4 5GY 5VS 6J9 8US AAHTB AAKMM AALFJ AAYFX AAYXX ABPEJ ABPPZ ACGFO ACGOD ACM ADBCU ADL ADMLS AEBYY AEFXT AEJOY AENEX AENSD AETEA AFWIH AFWXC AIAGR AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS ARCSS ASPBG AVWKF BDXCO CCLIF CITATION CS3 D0L EBS EDO EJD FEDTE GUFHI HGAVV H~9 I07 LHSKQ P1C P2P PQQKQ RNS ROL TUS UPT YR2 ZCA 7SC 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c311t-a977a0716a44f1f5a04dee712fbbca745564d5d3f3a06ec386a3c454c8ef05b63
ISSN	1049-331X
IngestDate	Fri Jul 11 14:18:12 EDT 2025 Thu Apr 24 22:54:03 EDT 2025 Thu Jul 03 08:25:00 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c311t-a977a0716a44f1f5a04dee712fbbca745564d5d3f3a06ec386a3c454c8ef05b63
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
PQID	1770333018
PQPubID	23500
PageCount	49
ParticipantIDs	proquest_miscellaneous_1770333018 crossref_primary_10_1145_2699688 crossref_citationtrail_10_1145_2699688
PublicationCentury	2000
PublicationDate	2015-08-01
PublicationDateYYYYMMDD	2015-08-01
PublicationDate_xml	– month: 08 year: 2015 text: 2015-08-01 day: 01
PublicationDecade	2010
PublicationTitle	ACM transactions on software engineering and methodology
PublicationYear	2015
References	e_1_2_1_42_1 e_1_2_1_20_1 e_1_2_1_41_1 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_21_1 e_1_2_1_22_1 Li N. (e_1_2_1_24_1) e_1_2_1_27_1 e_1_2_1_28_1 e_1_2_1_25_1 e_1_2_1_26_1 Fraser G. (e_1_2_1_11_1) e_1_2_1_29_1 Sautter G. (e_1_2_1_33_1) e_1_2_1_7_1 e_1_2_1_31_1 e_1_2_1_8_1 e_1_2_1_30_1 e_1_2_1_5_1 e_1_2_1_6_1 e_1_2_1_3_1 e_1_2_1_12_1 e_1_2_1_35_1 e_1_2_1_4_1 e_1_2_1_13_1 e_1_2_1_34_1 e_1_2_1_1_1 e_1_2_1_10_1 e_1_2_1_2_1 e_1_2_1_32_1 e_1_2_1_16_1 e_1_2_1_39_1 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_9_1 e_1_2_1_18_1 e_1_2_1_19_1
References_xml	– ident: e_1_2_1_7_1 doi: 10.1109/SSBSE.2010.25 – ident: e_1_2_1_10_1 doi: 10.1109/ICST.2012.92 – ident: e_1_2_1_1_1 doi: 10.1109/ICST.2013.11 – ident: e_1_2_1_5_1 doi: 10.1145/2076021.2048117 – ident: e_1_2_1_19_1 doi: 10.1145/1868321.1868326 – ident: e_1_2_1_6_1 doi: 10.1002/spe.602 – ident: e_1_2_1_16_1 doi: 10.1145/2379776.2379787 – ident: e_1_2_1_18_1 doi: 10.1111/1467-9280.00272 – ident: e_1_2_1_26_1 doi: 10.1145/800027.808424 – ident: e_1_2_1_42_1 doi: 10.1002/stvr.435 – ident: e_1_2_1_28_1 doi: 10.1145/1297846.1297902 – ident: e_1_2_1_35_1 doi: 10.1109/TSE.2005.97 – ident: e_1_2_1_3_1 doi: 10.1002/stvr.1486 – ident: e_1_2_1_20_1 doi: 10.1109/ASE.2011.6100138 – ident: e_1_2_1_12_1 doi: 10.1109/TSE.2012.14 – ident: e_1_2_1_21_1 doi: 10.1109/TSE.2002.1027796 – ident: e_1_2_1_38_1 doi: 10.5555/1792786.1792798 – ident: e_1_2_1_4_1 doi: 10.1109/ICST.2010.54 – volume-title: Proceedings of the 24th International IEEE Symposium on Proceedings of Software Reliability Engineering (ISSRE). 380--389 ident: e_1_2_1_24_1 – ident: e_1_2_1_32_1 doi: 10.1109/SEAA.2012.42 – ident: e_1_2_1_34_1 doi: 10.1109/32.799955 – ident: e_1_2_1_39_1 doi: 10.1145/1007512.1007528 – ident: e_1_2_1_8_1 doi: 10.1109/TSE.2006.92 – volume-title: Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE). 178--188 ident: e_1_2_1_11_1 – ident: e_1_2_1_22_1 doi: 10.1007/s10710-010-9112-3 – ident: e_1_2_1_30_1 doi: 10.1145/1858996.1859035 – ident: e_1_2_1_27_1 doi: 10.1145/1572272.1572280 – volume-title: Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. 357--367 ident: e_1_2_1_33_1 – ident: e_1_2_1_9_1 doi: 10.1145/2025113.2025179 – ident: e_1_2_1_14_1 doi: 10.1109/ICST.2011.53 – ident: e_1_2_1_29_1 doi: 10.1145/2001420.2001445 – ident: e_1_2_1_25_1 doi: 10.1002/stvr.v14:2 – ident: e_1_2_1_13_1 doi: 10.1145/2483760.2483774 – ident: e_1_2_1_23_1 doi: 10.1016/j.jss.2010.07.026 – ident: e_1_2_1_36_1 doi: 10.5555/2337223.2337326 – ident: e_1_2_1_31_1 doi: 10.1109/ICST.2013.13 – ident: e_1_2_1_41_1 doi: 10.1145/1985793.1985820 – ident: e_1_2_1_15_1 doi: 10.1109/TSE.2011.93 – ident: e_1_2_1_40_1 doi: 10.1016/S0950-5849(01)00190-2 – ident: e_1_2_1_17_1 doi: 10.1109/TSE.2009.71 – ident: e_1_2_1_2_1 doi: 10.1145/1062455.1062530 – ident: e_1_2_1_37_1 doi: 10.1145/2338965.2336776
SSID	ssj0013024
Score	2.3592875
Snippet	Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the...
SourceID	proquest crossref
SourceType	Aggregation Database Enrichment Source Index Database
StartPage	1
SubjectTerms	Automation Communities Computer programs Construction specifications Developers Hand tools Specifications Tasks
Title	Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study
URI	https://www.proquest.com/docview/1770333018
Volume	24
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bi9NAFB5q98UX7-K6KiOoLyWayVzSPEnZi4sYH2wX9skwmUxgoduWTYroL_HneuaWNLuCl5dQQmagOV_OZc4530HoFWcJoYon8ImDGBjJsqgUtYyEEqw0FGmuiCb_LE7P2Mdzfj4a_dypWtq25Vv147d9Jf8jVbgHcjVdsv8g2W5TuAG_Qb5wBQnD9a9kfLTWzWS2bdfgdmrb9ddOFqDmPZm0Fe0X8ASX34152UzmoHO_mVKvheVHaF7TE9uWbqvVl7DD8eXmwnGGzDva2UBRe5ibeRJhuLjNMjRhP92zGtpshJtLPTixBw-5cfD4APFun_wHb1e27shbtl0NTq7yC-dZmxOfHpUmOdDVYcrdIwvCu4K5oGUhLIkotbNywAh5zcvTKKXZQDW79moPQbajZ8mOwXaUpzdNATOsGYmAgM4NDhySbV8zgl1pomvU5oVfeAvtJRCAJGO0NzvKP837DFXsBiaHv-Iass3Sd37p0NMZGnrrvSzuoTs-7MAzh6H7aKRXD9DdMNIDew3_EH01kMIdpLCBFDZowT2ksIMUNpDCAVLYQ-o9nuEeULgDFLaAeoTOTo4Xh6eRH8ERKUpIG0kIDyR4oUIyVpOay5hVWqckqctSyZRxLljFK1pTGQut6FRIqhhnaqrrmJeCPkbj1XqlnyCcphVRgoIvJCmrsgT8xJTVMRhjrjJRi330JryuQnl-ejMmZVlcE8k-wt2DG0fJcvORl-F9F6AuTQ5MrvR62xQkBRNHwapNn_55mwN0u4fvMzRur7b6OfigbfnCw-EXqOGImA
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Does+Automated+Unit+Test+Generation+Really+Help+Software+Testers%3F+A+Controlled+Empirical+Study&rft.jtitle=ACM+transactions+on+software+engineering+and+methodology&rft.au=Fraser%2C+Gordon&rft.au=Staats%2C+Matt&rft.au=McMinn%2C+Phil&rft.au=Arcuri%2C+Andrea&rft.date=2015-08-01&rft.issn=1049-331X&rft.eissn=1557-7392&rft.volume=24&rft.issue=4&rft.spage=1&rft.epage=49&rft_id=info:doi/10.1145%2F2699688&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_2699688
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1049-331X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1049-331X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1049-331X&client=summon