Semantic-Enhanced Indirect Call Analysis with Large Language Models
In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call...
Saved in:
Published in | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 430 - 442 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
ACM
27.10.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 2643-1572 |
DOI | 10.1145/3691620.3695016 |
Cover
Abstract | In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the program, limiting their effectiveness in real-world scenarios.To address these issues, this paper proposes Semantic-Enhanced Analysis (SEA), a new approach to enhance the effectiveness of indirect call analysis. Our fundamental insight is that for common programming practices, indirect calls often exhibit semantic similarity with their invoked targets. This semantic alignment serves as a supportive mechanism for static analysis techniques in filtering out false targets. Notably, contemporary large language models (LLMs) are trained on extensive code corpora, encompassing tasks such as code summarization, making them well-suited for semantic analysis. Specifically, SEA leverages LLMs to generate natural language summaries of both indirect calls and target functions from multiple perspectives. Through further analysis of these summaries, SEA can determine their suitability as caller-callee pairs. Experimental results demonstrate that SEA can significantly enhance existing static analysis methods by producing more precise target sets for indirect calls.CCS CONCEPTS*Software and its engineering → Software maintenance tools. |
---|---|
AbstractList | In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the program, limiting their effectiveness in real-world scenarios.To address these issues, this paper proposes Semantic-Enhanced Analysis (SEA), a new approach to enhance the effectiveness of indirect call analysis. Our fundamental insight is that for common programming practices, indirect calls often exhibit semantic similarity with their invoked targets. This semantic alignment serves as a supportive mechanism for static analysis techniques in filtering out false targets. Notably, contemporary large language models (LLMs) are trained on extensive code corpora, encompassing tasks such as code summarization, making them well-suited for semantic analysis. Specifically, SEA leverages LLMs to generate natural language summaries of both indirect calls and target functions from multiple perspectives. Through further analysis of these summaries, SEA can determine their suitability as caller-callee pairs. Experimental results demonstrate that SEA can significantly enhance existing static analysis methods by producing more precise target sets for indirect calls.CCS CONCEPTS*Software and its engineering → Software maintenance tools. |
Author | Guo, Yao Zhang, Cen Wang, Haoyu Shi, Ling Liu, Yang Chen, Xiangqun Cheng, Baijun Wang, Kailong Li, Ding |
Author_xml | – sequence: 1 givenname: Baijun surname: Cheng fullname: Cheng, Baijun organization: Peking University,China – sequence: 2 givenname: Cen surname: Zhang fullname: Zhang, Cen organization: Nanyang Technological University,Singapore – sequence: 3 givenname: Kailong surname: Wang fullname: Wang, Kailong organization: Huazhong University of Science and Technology,China – sequence: 4 givenname: Ling surname: Shi fullname: Shi, Ling organization: Nanyang Technological University,Singapore – sequence: 5 givenname: Yang surname: Liu fullname: Liu, Yang organization: Nanyang Technological University,Singapore – sequence: 6 givenname: Haoyu surname: Wang fullname: Wang, Haoyu organization: Huazhong University of Science and Technology,China – sequence: 7 givenname: Yao surname: Guo fullname: Guo, Yao organization: Peking University,China – sequence: 8 givenname: Ding surname: Li fullname: Li, Ding organization: Peking University,China – sequence: 9 givenname: Xiangqun surname: Chen fullname: Chen, Xiangqun organization: Peking University,China |
BookMark | eNotjr1OwzAURg0CiVI6szDkBVJ87fg6HquolEpBDMBc-eemtZS6KA5CfXuCYPnOmT6dW3aVTokYuwe-BKjUo0QDKPhyouKAF2xhtKkrzjWIqtaXbCawkiUoLW7YIufo-KQKAXDGmjc62jRGX67TwSZPodimEAfyY9HYvi9WyfbnHHPxHcdD0dphT9Om_Zed5OUUqM937LqzfabFP-fs42n93jyX7etm26za0k4dY-kxWAHCdeidUSJoQm-8tjy4TknnOsdl7dH_pnVdQCVAK0FaGI3Gg5Nz9vD3G4lo9znEox3OO-Aaq5oL-QPuuUyA |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1145/3691620.3695016 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9798400712487 |
EISSN | 2643-1572 |
EndPage | 442 |
ExternalDocumentID | 10764802 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Science and Technology Major Project funderid: 10.13039/501100018537 – fundername: National Research Foundation funderid: 10.13039/501100001321 |
GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
ID | FETCH-LOGICAL-a248t-c6da212bf6cb952d7e6c9c7a0dbf53bbfb038c6c2561ffd6521752e729769c1b3 |
IEDL.DBID | RIE |
IngestDate | Wed Jan 15 06:20:43 EST 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a248t-c6da212bf6cb952d7e6c9c7a0dbf53bbfb038c6c2561ffd6521752e729769c1b3 |
PageCount | 13 |
ParticipantIDs | ieee_primary_10764802 |
PublicationCentury | 2000 |
PublicationDate | 2024-Oct.-27 |
PublicationDateYYYYMMDD | 2024-10-27 |
PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-27 day: 27 |
PublicationDecade | 2020 |
PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
PublicationTitleAbbrev | ASE |
PublicationYear | 2024 |
Publisher | ACM |
Publisher_xml | – name: ACM |
SSID | ssib057256116 ssj0051577 |
Score | 2.29955 |
Snippet | In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 430 |
SubjectTerms | Codes Indirect-call analysis Large language models Limiting LLM Natural languages Programming Semantic analysis Semantics Software development management Software engineering Software maintenance Static analysis |
Title | Semantic-Enhanced Indirect Call Analysis with Large Language Models |
URI | https://ieeexplore.ieee.org/document/10764802 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7ak6f6qPgmB69bd7N5dM-lpYoWQQu9lcwki2LdSt1e_PVO9qEiCF6WsKcwyeSbTGa-j7FLBzLXYCFymY0jKSCLLKTB3b3SDuIUK3b-u6mezOTNXM2bZvWqF8Z7XxWf-X4YVm_5boWbkCojDzdaDgJ15Dbts7pZq908yhB4JyHWqY9hwmljGi6fRKqrVFMgJOiOqjMVB3XzH2IqFZaMu2zazqIuIXnpb0ro48cvgsZ_T3OX9b7b9vj9FyDtsS1f7LNuq9vAGzc-YMMH_0oGfcZoVDxVFQD8uqixjQ_tcslbphIesrT8NhSL07dObPKgnrZ877HZePQ4nESNmEJkhRyUEWpnCaYg1wiZEs54jRkaGzvIVQqQ07oMUGOwYp47TbBulPAUexudYQLpIesUq8IfMS6Ns0CnqDGJk4m3IJzIhUQ0WmOK8THrBaMs3mq-jEVrj5M__p-yHUGhQkAEYc5Yp1xv_DlBfQkX1RJ_Ar04p94 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LS8NAEMcXqQc91UfFt3vwmpps9tGcS0urbRFsobeyM7tBsaaiycVP724eKoLgJYScltnM_iaTmf8Qcm2ApxI0BCbRYcAZJIGG2Lu7FdJAGGOpzj-dydGC3y7Fsm5WL3thrLVl8Znt-tvyX77ZYOFTZc7DleQ9Lx257cDPRdWu1bw-Qjl8Rz7aqQ5iR2qlajWfiIubWLpQiLmvVJmI0M83_zFOpaTJsE1mzTqqIpLnbpFDFz9-STT-e6F7pPPduEfvv5C0T7ZsdkDazeQGWjvyIek_2Bdn0icMBtljWQNAx1lFN9rX6zVttEqoz9PSiS8Xd9cqtUn9_LT1e4cshoN5fxTU4xQCzXgvD1Aa7UAFqURIBDPKSkxQ6dBAKmKA1O1MDyV6K6apkQ7sSjDrom8lE4wgPiKtbJPZY0K5MhrcOapUZHhkNTDDUsYRlZQYY3hCOt4oq9dKMWPV2OP0j-dXZGc0n05Wk_Hs7ozsMhc4eD4wdU5a-VthLxz4c7gst_sTuaurKw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=Semantic-Enhanced+Indirect+Call+Analysis+with+Large+Language+Models&rft.au=Cheng%2C+Baijun&rft.au=Zhang%2C+Cen&rft.au=Wang%2C+Kailong&rft.au=Shi%2C+Ling&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=430&rft.epage=442&rft_id=info:doi/10.1145%2F3691620.3695016&rft.externalDocID=10764802 |