Planning a Large Language Model for Static Detection of Runtime Errors in Code Snippets
Large Language Models (LLMs) have been excellent in generating and reasoning about source code and natural-language texts. They can recognize patterns, syntax, and semantics in code, making them effective in several software engineering tasks. However, they exhibit weaknesses in reasoning about the...
Saved in:
Published in | Proceedings / International Conference on Software Engineering pp. 872 - 884 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
26.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large Language Models (LLMs) have been excellent in generating and reasoning about source code and natural-language texts. They can recognize patterns, syntax, and semantics in code, making them effective in several software engineering tasks. However, they exhibit weaknesses in reasoning about the program execution. They primarily operate on static code representations, failing to capture the dynamic behavior and state changes that occur during program execution. In this paper, we advance the capabilities of LLMs in reasoning about dynamic program behaviors. We propose Orca, a novel approach that instructs an LLM to autonomously formulate a plan to navigate through a control flow graph (CFG) for predictive execution of (in)complete code snippets. It acts as a predictive interpreter to "execute" the code. In Orca, we guide the LLM to pause at the branching point, focusing on the state of the symbol tables for variables' values, thus minimizing error propagation in the LLM's computation. We instruct the LLM not to stop at each step in its execution plan, resulting the use of only one prompt for the entire predictive interpreter, thus much cost-saving. As a downstream task, we use Orca to statically identify any runtime errors for online code snippets. Early detection of runtime errors and defects in these snippets is crucial to prevent costly fixes later in the development cycle after they were adapted into a codebase. Our empirical evaluation showed that Orca is effective and improves over the state-of-the-art approaches in predicting the execution traces and in static detection of runtime errors. |
---|---|
ISSN: | 1558-1225 |
DOI: | 10.1109/ICSE55347.2025.00102 |