TerzoN: Human-in-the-Loop Software Testing with a Composite Oracle
Software testing is difficult, tedious, and may consume 28%–50% of software engineering labor. Automatic test generators aim to ease this burden but have important trade-offs. Fuzzers use an implicit oracle that can detect obviously invalid results, but the oracle problem has no general solution, an...
Saved in:
Published in | Proceedings of the ACM on software engineering Vol. 2; no. FSE; pp. 1983 - 2005 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York, NY, USA
ACM
19.06.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Software testing is difficult, tedious, and may consume 28%–50% of software engineering labor. Automatic test generators aim to ease this burden but have important trade-offs. Fuzzers use an implicit oracle that can detect obviously invalid results, but the oracle problem has no general solution, and an implicit oracle cannot automatically evaluate correctness. Test suite generators like EvoSuite use the program under test as the oracle and therefore cannot evaluate correctness. Property-based testing tools evaluate correctness, but users have difficulty coming up with properties to test and understanding whether their properties are correct. Consequently, practitioners create many test suites manually and often use an example-based oracle to tediously specify correct input and output examples. To help bridge the gaps among various oracle and tool types, we present the Composite Oracle, which organizes various oracle types into a hierarchy and renders a single test result per example execution. To understand the Composite Oracle’s practical properties, we built TerzoN, a test suite generator that includes a particular instantiation of the Composite Oracle. TerzoN displays all the test results in an integrated view composed from the results of three types of oracles and finds some types of test assertion inconsistencies that might otherwise lead to misleading test results. We evaluated TerzoN in a randomized controlled trial with 14 professional software engineers with a popular industry tool, fast-check, as the control. Participants using TerzoN elicited 72% more bugs (p < 0.01), accurately described more than twice the number of bugs (p < 0.01) and tested 16% more quickly (p < 0.05) relative to fast-check. |
---|---|
ISSN: | 2994-970X 2994-970X |
DOI: | 10.1145/3729359 |