Code Search Oriented Node-Enhanced Control Flow Graph Embedding

Searching for code aims to return code snippets that correspond to specified queries. Improving the accuracy of matching between heterogeneous natural language query inputs and highly structured program language source code is a fundamental issue in code search. The semantics expressed by code state...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Working Conference on Source Code Analysis and Manipulation pp. 59 - 70
Main Authors Xu, Yang, Peng, Wenliang
Format Conference Proceeding
LanguageEnglish
Published IEEE 07.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Searching for code aims to return code snippets that correspond to specified queries. Improving the accuracy of matching between heterogeneous natural language query inputs and highly structured program language source code is a fundamental issue in code search. The semantics expressed by code statements are not only related to themselves but also to the context in which they exist. Control Flow Graph (CFG) contains the sequence, branching and looping structure relationships of execution of program statements, which are important contextual information for understanding code functionality semantics. Additionally, CFG statements themselves not only contain textual features but also code syntax structure features. However, existing methods have failed to effectively utilize the characteristics of CFG, resulting in inadequate search accuracy. Concerning this issue, our paper constructs a node-enhanced Control Flow Graph(node-enhanced CFG) by setting text and syntax structure attributes for CFG nodes. We also propose a Code Search method based on Node-enhanced CFG embedding, called NCFG-CS. To fully extract features of the node-enhanced CFG, we employ an asymptotic fusion strategy. We first fuse the text features and syntax structure features of code statements, and then merge the features of the entire graph of node-enhanced CFG. In experiments comparing with existing advanced multimodal methods on public datasets, NCFG-CS improves MRR by at least 5%. Ablation experiments indicate that syntax structure features of statements contribute more to code search. Additionally, to verify the performance of NCFG-CS in real search scenarios and the generalization of experiments, we test our model using code-query datasets provided by Gu et al. The experiment once again verified the effectiveness of N CFG-CS, while also demonstrating its good generalization ability.
ISSN:2470-6892
DOI:10.1109/SCAM63643.2024.00016