TerzoN: Human-in-the-Loop Software Testing with a Composite Oracle

Software testing is difficult, tedious, and may consume 28%–50% of software engineering labor. Automatic test generators aim to ease this burden but have important trade-offs. Fuzzers use an implicit oracle that can detect obviously invalid results, but the oracle problem has no general solution, an...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the ACM on software engineering Vol. 2; no. FSE; pp. 1983 - 2005
Main Authors	Davis, Matthew C., Wei, Amy, Myers, Brad A., Sunshine, Joshua
Format	Journal Article
Language	English
Published	New York, NY, USA ACM 19.06.2025
Subjects	Human-centered computing Software and its engineering Software testing and debugging User studies Empirical software engineering software testing usable testing user study experiments automatic test generation composite oracle human subjects
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Software testing is difficult, tedious, and may consume 28%–50% of software engineering labor. Automatic test generators aim to ease this burden but have important trade-offs. Fuzzers use an implicit oracle that can detect obviously invalid results, but the oracle problem has no general solution, and an implicit oracle cannot automatically evaluate correctness. Test suite generators like EvoSuite use the program under test as the oracle and therefore cannot evaluate correctness. Property-based testing tools evaluate correctness, but users have difficulty coming up with properties to test and understanding whether their properties are correct. Consequently, practitioners create many test suites manually and often use an example-based oracle to tediously specify correct input and output examples. To help bridge the gaps among various oracle and tool types, we present the Composite Oracle, which organizes various oracle types into a hierarchy and renders a single test result per example execution. To understand the Composite Oracle’s practical properties, we built TerzoN, a test suite generator that includes a particular instantiation of the Composite Oracle. TerzoN displays all the test results in an integrated view composed from the results of three types of oracles and finds some types of test assertion inconsistencies that might otherwise lead to misleading test results. We evaluated TerzoN in a randomized controlled trial with 14 professional software engineers with a popular industry tool, fast-check, as the control. Participants using TerzoN elicited 72% more bugs (p < 0.01), accurately described more than twice the number of bugs (p < 0.01) and tested 16% more quickly (p < 0.05) relative to fast-check.
ISSN:	2994-970X 2994-970X
DOI:	10.1145/3729359