Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study

Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is ass...

Full description

Saved in:
Bibliographic Details
Published inACM transactions on software engineering and methodology Vol. 24; no. 4; pp. 1 - 49
Main Authors Fraser, Gordon, Staats, Matt, McMinn, Phil, Arcuri, Andrea, Padberg, Frank
Format Journal Article
LanguageEnglish
Published 01.08.2015
Subjects
Online AccessGet full text
ISSN1049-331X
1557-7392
DOI10.1145/2699688

Cover

Loading…
Abstract Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, E vo S uite . We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.
AbstractList Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, E vo S uite . We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.
Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.
Author Fraser, Gordon
McMinn, Phil
Staats, Matt
Padberg, Frank
Arcuri, Andrea
Author_xml – sequence: 1
  givenname: Gordon
  surname: Fraser
  fullname: Fraser, Gordon
  organization: University of Sheffield, Sheffield, UK
– sequence: 2
  givenname: Matt
  surname: Staats
  fullname: Staats, Matt
  organization: University of Luxembourg, Luxembourg
– sequence: 3
  givenname: Phil
  surname: McMinn
  fullname: McMinn, Phil
  organization: University of Sheffield, Sheffield, UK
– sequence: 4
  givenname: Andrea
  surname: Arcuri
  fullname: Arcuri, Andrea
  organization: Simula Research Laboratory, Lysaker, Norway
– sequence: 5
  givenname: Frank
  surname: Padberg
  fullname: Padberg, Frank
  organization: Karlsruhe Institute of Technology, Karlsruhe, Germany
BookMark eNpl0EFLwzAUB_AgE5xT_Aq56aWaNEnTnmTMuQkDwW3gyZKlrxBJm5qkyL69ddtJT-8dfv_H43-JRq1rAaEbSu4p5eIhzYoiy_MzNKZCyESyIh0NO-FFwhh9v0CXIXwSQhlJ-Rh9PDkIeNpH16gIFd62JuINhIgX0IJX0bgWv4Gydo-XYDu8dnX8Vh4OCHx4xFM8c230ztohP286441WFq9jX-2v0HmtbIDr05yg7fN8M1smq9fFy2y6SjSjNCaqkFIRSTPFeU1roQivACRN691OK8mFyHglKlYzRTLQLM8U01xwnUNNxC5jE3R3vNt599UPn5WNCRqsVS24PpRUSsIYIzQf6O2Rau9C8FCXnTeN8vuSkvK3wfLU4CCTP1KbeGgkemXsP_8DDeRzgQ
CitedBy_id crossref_primary_10_1007_s10664_025_10635_z
crossref_primary_10_1016_j_infsof_2022_106994
crossref_primary_10_1016_j_jss_2018_03_052
crossref_primary_10_1016_j_jss_2021_110933
crossref_primary_10_1002_stvr_1660
crossref_primary_10_1109_TSE_2018_2877664
crossref_primary_10_1002_smr_2158
crossref_primary_10_1145_3695988
crossref_primary_10_1007_s11219_019_09446_5
crossref_primary_10_1109_TSE_2015_2448531
crossref_primary_10_1007_s10664_024_10451_x
crossref_primary_10_1016_j_jss_2018_06_024
crossref_primary_10_1109_ACCESS_2020_3022876
crossref_primary_10_1007_s10664_019_09692_y
crossref_primary_10_1145_3487569
crossref_primary_10_1002_stvr_1627
crossref_primary_10_1002_stvr_1748
crossref_primary_10_1109_ACCESS_2022_3222803
crossref_primary_10_1109_TSE_2024_3381015
crossref_primary_10_1007_s10664_019_09765_y
crossref_primary_10_1109_TSE_2022_3176725
crossref_primary_10_1016_j_infsof_2018_05_001
crossref_primary_10_1007_s10664_021_10094_2
Cites_doi 10.1109/SSBSE.2010.25
10.1109/ICST.2012.92
10.1109/ICST.2013.11
10.1145/2076021.2048117
10.1145/1868321.1868326
10.1002/spe.602
10.1145/2379776.2379787
10.1111/1467-9280.00272
10.1145/800027.808424
10.1002/stvr.435
10.1145/1297846.1297902
10.1109/TSE.2005.97
10.1002/stvr.1486
10.1109/ASE.2011.6100138
10.1109/TSE.2012.14
10.1109/TSE.2002.1027796
10.5555/1792786.1792798
10.1109/ICST.2010.54
10.1109/SEAA.2012.42
10.1109/32.799955
10.1145/1007512.1007528
10.1109/TSE.2006.92
10.1007/s10710-010-9112-3
10.1145/1858996.1859035
10.1145/1572272.1572280
10.1145/2025113.2025179
10.1109/ICST.2011.53
10.1145/2001420.2001445
10.1002/stvr.v14:2
10.1145/2483760.2483774
10.1016/j.jss.2010.07.026
10.5555/2337223.2337326
10.1109/ICST.2013.13
10.1145/1985793.1985820
10.1109/TSE.2011.93
10.1016/S0950-5849(01)00190-2
10.1109/TSE.2009.71
10.1145/1062455.1062530
10.1145/2338965.2336776
ContentType Journal Article
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1145/2699688
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList CrossRef
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1557-7392
EndPage 49
ExternalDocumentID 10_1145_2699688
GroupedDBID --Z
-DZ
-~X
.4S
.DC
23M
4.4
5GY
5VS
6J9
8US
AAHTB
AAKMM
AALFJ
AAYFX
AAYXX
ABPEJ
ABPPZ
ACGFO
ACGOD
ACM
ADBCU
ADL
ADMLS
AEBYY
AEFXT
AEJOY
AENEX
AENSD
AETEA
AFWIH
AFWXC
AIAGR
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
ARCSS
ASPBG
AVWKF
BDXCO
CCLIF
CITATION
CS3
D0L
EBS
EDO
EJD
FEDTE
GUFHI
HGAVV
H~9
I07
LHSKQ
P1C
P2P
PQQKQ
RNS
ROL
TUS
UPT
YR2
ZCA
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c311t-a977a0716a44f1f5a04dee712fbbca745564d5d3f3a06ec386a3c454c8ef05b63
ISSN 1049-331X
IngestDate Fri Jul 11 14:18:12 EDT 2025
Thu Apr 24 22:54:03 EDT 2025
Thu Jul 03 08:25:00 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c311t-a977a0716a44f1f5a04dee712fbbca745564d5d3f3a06ec386a3c454c8ef05b63
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PQID 1770333018
PQPubID 23500
PageCount 49
ParticipantIDs proquest_miscellaneous_1770333018
crossref_primary_10_1145_2699688
crossref_citationtrail_10_1145_2699688
PublicationCentury 2000
PublicationDate 2015-08-01
PublicationDateYYYYMMDD 2015-08-01
PublicationDate_xml – month: 08
  year: 2015
  text: 2015-08-01
  day: 01
PublicationDecade 2010
PublicationTitle ACM transactions on software engineering and methodology
PublicationYear 2015
References e_1_2_1_42_1
e_1_2_1_20_1
e_1_2_1_41_1
e_1_2_1_40_1
e_1_2_1_23_1
e_1_2_1_21_1
e_1_2_1_22_1
Li N. (e_1_2_1_24_1)
e_1_2_1_27_1
e_1_2_1_28_1
e_1_2_1_25_1
e_1_2_1_26_1
Fraser G. (e_1_2_1_11_1)
e_1_2_1_29_1
Sautter G. (e_1_2_1_33_1)
e_1_2_1_7_1
e_1_2_1_31_1
e_1_2_1_8_1
e_1_2_1_30_1
e_1_2_1_5_1
e_1_2_1_6_1
e_1_2_1_3_1
e_1_2_1_12_1
e_1_2_1_35_1
e_1_2_1_4_1
e_1_2_1_13_1
e_1_2_1_34_1
e_1_2_1_1_1
e_1_2_1_10_1
e_1_2_1_2_1
e_1_2_1_32_1
e_1_2_1_16_1
e_1_2_1_39_1
e_1_2_1_17_1
e_1_2_1_38_1
e_1_2_1_14_1
e_1_2_1_37_1
e_1_2_1_15_1
e_1_2_1_36_1
e_1_2_1_9_1
e_1_2_1_18_1
e_1_2_1_19_1
References_xml – ident: e_1_2_1_7_1
  doi: 10.1109/SSBSE.2010.25
– ident: e_1_2_1_10_1
  doi: 10.1109/ICST.2012.92
– ident: e_1_2_1_1_1
  doi: 10.1109/ICST.2013.11
– ident: e_1_2_1_5_1
  doi: 10.1145/2076021.2048117
– ident: e_1_2_1_19_1
  doi: 10.1145/1868321.1868326
– ident: e_1_2_1_6_1
  doi: 10.1002/spe.602
– ident: e_1_2_1_16_1
  doi: 10.1145/2379776.2379787
– ident: e_1_2_1_18_1
  doi: 10.1111/1467-9280.00272
– ident: e_1_2_1_26_1
  doi: 10.1145/800027.808424
– ident: e_1_2_1_42_1
  doi: 10.1002/stvr.435
– ident: e_1_2_1_28_1
  doi: 10.1145/1297846.1297902
– ident: e_1_2_1_35_1
  doi: 10.1109/TSE.2005.97
– ident: e_1_2_1_3_1
  doi: 10.1002/stvr.1486
– ident: e_1_2_1_20_1
  doi: 10.1109/ASE.2011.6100138
– ident: e_1_2_1_12_1
  doi: 10.1109/TSE.2012.14
– ident: e_1_2_1_21_1
  doi: 10.1109/TSE.2002.1027796
– ident: e_1_2_1_38_1
  doi: 10.5555/1792786.1792798
– ident: e_1_2_1_4_1
  doi: 10.1109/ICST.2010.54
– volume-title: Proceedings of the 24th International IEEE Symposium on Proceedings of Software Reliability Engineering (ISSRE). 380--389
  ident: e_1_2_1_24_1
– ident: e_1_2_1_32_1
  doi: 10.1109/SEAA.2012.42
– ident: e_1_2_1_34_1
  doi: 10.1109/32.799955
– ident: e_1_2_1_39_1
  doi: 10.1145/1007512.1007528
– ident: e_1_2_1_8_1
  doi: 10.1109/TSE.2006.92
– volume-title: Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE). 178--188
  ident: e_1_2_1_11_1
– ident: e_1_2_1_22_1
  doi: 10.1007/s10710-010-9112-3
– ident: e_1_2_1_30_1
  doi: 10.1145/1858996.1859035
– ident: e_1_2_1_27_1
  doi: 10.1145/1572272.1572280
– volume-title: Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. 357--367
  ident: e_1_2_1_33_1
– ident: e_1_2_1_9_1
  doi: 10.1145/2025113.2025179
– ident: e_1_2_1_14_1
  doi: 10.1109/ICST.2011.53
– ident: e_1_2_1_29_1
  doi: 10.1145/2001420.2001445
– ident: e_1_2_1_25_1
  doi: 10.1002/stvr.v14:2
– ident: e_1_2_1_13_1
  doi: 10.1145/2483760.2483774
– ident: e_1_2_1_23_1
  doi: 10.1016/j.jss.2010.07.026
– ident: e_1_2_1_36_1
  doi: 10.5555/2337223.2337326
– ident: e_1_2_1_31_1
  doi: 10.1109/ICST.2013.13
– ident: e_1_2_1_41_1
  doi: 10.1145/1985793.1985820
– ident: e_1_2_1_15_1
  doi: 10.1109/TSE.2011.93
– ident: e_1_2_1_40_1
  doi: 10.1016/S0950-5849(01)00190-2
– ident: e_1_2_1_17_1
  doi: 10.1109/TSE.2009.71
– ident: e_1_2_1_2_1
  doi: 10.1145/1062455.1062530
– ident: e_1_2_1_37_1
  doi: 10.1145/2338965.2336776
SSID ssj0013024
Score 2.3592875
Snippet Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the...
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
StartPage 1
SubjectTerms Automation
Communities
Computer programs
Construction specifications
Developers
Hand tools
Specifications
Tasks
Title Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study
URI https://www.proquest.com/docview/1770333018
Volume 24
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bi9NAFB5q98UX7-K6KiOoLyWayVzSPEnZi4sYH2wX9skwmUxgoduWTYroL_HneuaWNLuCl5dQQmagOV_OZc4530HoFWcJoYon8ImDGBjJsqgUtYyEEqw0FGmuiCb_LE7P2Mdzfj4a_dypWtq25Vv147d9Jf8jVbgHcjVdsv8g2W5TuAG_Qb5wBQnD9a9kfLTWzWS2bdfgdmrb9ddOFqDmPZm0Fe0X8ASX34152UzmoHO_mVKvheVHaF7TE9uWbqvVl7DD8eXmwnGGzDva2UBRe5ibeRJhuLjNMjRhP92zGtpshJtLPTixBw-5cfD4APFun_wHb1e27shbtl0NTq7yC-dZmxOfHpUmOdDVYcrdIwvCu4K5oGUhLIkotbNywAh5zcvTKKXZQDW79moPQbajZ8mOwXaUpzdNATOsGYmAgM4NDhySbV8zgl1pomvU5oVfeAvtJRCAJGO0NzvKP837DFXsBiaHv-Iass3Sd37p0NMZGnrrvSzuoTs-7MAzh6H7aKRXD9DdMNIDew3_EH01kMIdpLCBFDZowT2ksIMUNpDCAVLYQ-o9nuEeULgDFLaAeoTOTo4Xh6eRH8ERKUpIG0kIDyR4oUIyVpOay5hVWqckqctSyZRxLljFK1pTGQut6FRIqhhnaqrrmJeCPkbj1XqlnyCcphVRgoIvJCmrsgT8xJTVMRhjrjJRi330JryuQnl-ejMmZVlcE8k-wt2DG0fJcvORl-F9F6AuTQ5MrvR62xQkBRNHwapNn_55mwN0u4fvMzRur7b6OfigbfnCw-EXqOGImA
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Does+Automated+Unit+Test+Generation+Really+Help+Software+Testers%3F+A+Controlled+Empirical+Study&rft.jtitle=ACM+transactions+on+software+engineering+and+methodology&rft.au=Fraser%2C+Gordon&rft.au=Staats%2C+Matt&rft.au=McMinn%2C+Phil&rft.au=Arcuri%2C+Andrea&rft.date=2015-08-01&rft.issn=1049-331X&rft.eissn=1557-7392&rft.volume=24&rft.issue=4&rft.spage=1&rft.epage=49&rft_id=info:doi/10.1145%2F2699688&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_2699688
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1049-331X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1049-331X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1049-331X&client=summon