Semantic-Enhanced Indirect Call Analysis with Large Language Models

In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 430 - 442
Main Authors	Cheng, Baijun, Zhang, Cen, Wang, Kailong, Shi, Ling, Liu, Yang, Wang, Haoyu, Guo, Yao, Li, Ding, Chen, Xiangqun
Format	Conference Proceeding
Language	English
Published	ACM 27.10.2024
Subjects	Codes Indirect-call analysis Large language models Limiting LLM Natural languages Programming Semantic analysis Semantics Software development management Software engineering Software maintenance Static analysis
Online Access	Get full text
ISSN	2643-1572
DOI	10.1145/3691620.3695016

Cover

Loading…

More Information
Summary:	In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the program, limiting their effectiveness in real-world scenarios.To address these issues, this paper proposes Semantic-Enhanced Analysis (SEA), a new approach to enhance the effectiveness of indirect call analysis. Our fundamental insight is that for common programming practices, indirect calls often exhibit semantic similarity with their invoked targets. This semantic alignment serves as a supportive mechanism for static analysis techniques in filtering out false targets. Notably, contemporary large language models (LLMs) are trained on extensive code corpora, encompassing tasks such as code summarization, making them well-suited for semantic analysis. Specifically, SEA leverages LLMs to generate natural language summaries of both indirect calls and target functions from multiple perspectives. Through further analysis of these summaries, SEA can determine their suitability as caller-callee pairs. Experimental results demonstrate that SEA can significantly enhance existing static analysis methods by producing more precise target sets for indirect calls.CCS CONCEPTS*Software and its engineering → Software maintenance tools.
ISSN:	2643-1572
DOI:	10.1145/3691620.3695016