Rug: Turbo Llm for Rust Unit Test Generation

Unit testing improves software quality by evaluating isolated sections of the program. This approach alleviates the need for comprehensive program-wide testing and confines the potential error scope within the software. However, unit test development is time-consuming, requiring developers to create...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 2983 - 2995
Main Authors	Cheng, Xiang, Sang, Fan, Zhai, Yizhuo, Zhang, Xiaokuan, Kim, Taesoo
Format	Conference Proceeding
Language	English
Published	IEEE 26.04.2025
Subjects	Codes Fuzzing Large language model Large language models Reviews Rust Scalability Software engineering Software quality Source coding Test pattern generators Testing Unit testing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Unit testing improves software quality by evaluating isolated sections of the program. This approach alleviates the need for comprehensive program-wide testing and confines the potential error scope within the software. However, unit test development is time-consuming, requiring developers to create appropriate test contexts and determine input values to cover different code regions. This problem is particularly pronounced in Rust due to its intricate type system, making traditional unit test generation tools ineffective in Rust projects. Recently, large language models (LLMs) have demonstrated their proficiency in understanding programming language and completing software engineering tasks. However, merely prompting LLMs with a basic prompt like "generate unit test for the following source code" often results in code with compilation errors. In addition, LLM-generated unit tests often have limited test coverage. To bridge this gap and harness the capabilities of LLM, we design and implement RUG, an end-to-end solution to automatically generate the unit test for Rust projects. To help LLM's generated test pass Rust strict compilation checks, RUG designs a semantic-aware bottom-up approach to divide the context construction problem into dependent sub-problems. It solves these sub-problems sequentially using an LLM and merges them to a complete context. To increase test coverage, RUG integrates coverage-guided fuzzing with LLM to prepare fuzzing harnesses. Applying RUG on 17 real-world Rust programs (average 24,937 \text{LoC} ), we show that RUG can achieve a high code coverage, up to \mathbf{7 1. 3 7 \%} , closely comparable to human effort (\mathbf{7 3. 1 8 \%}) . We submitted 113 unit tests generated by RUG covering the new code: 53 of them have been accepted, 17 rejected, and 43 are pending for review.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00097