Semantic-Enhanced Indirect Call Analysis with Large Language Models

In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM International Conference on Automated Software Engineering : [proceedings] pp. 430 - 442
Main Authors Cheng, Baijun, Zhang, Cen, Wang, Kailong, Shi, Ling, Liu, Yang, Wang, Haoyu, Guo, Yao, Li, Ding, Chen, Xiangqun
Format Conference Proceeding
LanguageEnglish
Published ACM 27.10.2024
Subjects
Online AccessGet full text
ISSN2643-1572
DOI10.1145/3691620.3695016

Cover

Abstract In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the program, limiting their effectiveness in real-world scenarios.To address these issues, this paper proposes Semantic-Enhanced Analysis (SEA), a new approach to enhance the effectiveness of indirect call analysis. Our fundamental insight is that for common programming practices, indirect calls often exhibit semantic similarity with their invoked targets. This semantic alignment serves as a supportive mechanism for static analysis techniques in filtering out false targets. Notably, contemporary large language models (LLMs) are trained on extensive code corpora, encompassing tasks such as code summarization, making them well-suited for semantic analysis. Specifically, SEA leverages LLMs to generate natural language summaries of both indirect calls and target functions from multiple perspectives. Through further analysis of these summaries, SEA can determine their suitability as caller-callee pairs. Experimental results demonstrate that SEA can significantly enhance existing static analysis methods by producing more precise target sets for indirect calls.CCS CONCEPTS*Software and its engineering → Software maintenance tools.
AbstractList In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the program, limiting their effectiveness in real-world scenarios.To address these issues, this paper proposes Semantic-Enhanced Analysis (SEA), a new approach to enhance the effectiveness of indirect call analysis. Our fundamental insight is that for common programming practices, indirect calls often exhibit semantic similarity with their invoked targets. This semantic alignment serves as a supportive mechanism for static analysis techniques in filtering out false targets. Notably, contemporary large language models (LLMs) are trained on extensive code corpora, encompassing tasks such as code summarization, making them well-suited for semantic analysis. Specifically, SEA leverages LLMs to generate natural language summaries of both indirect calls and target functions from multiple perspectives. Through further analysis of these summaries, SEA can determine their suitability as caller-callee pairs. Experimental results demonstrate that SEA can significantly enhance existing static analysis methods by producing more precise target sets for indirect calls.CCS CONCEPTS*Software and its engineering → Software maintenance tools.
Author Guo, Yao
Zhang, Cen
Wang, Haoyu
Shi, Ling
Liu, Yang
Chen, Xiangqun
Cheng, Baijun
Wang, Kailong
Li, Ding
Author_xml – sequence: 1
  givenname: Baijun
  surname: Cheng
  fullname: Cheng, Baijun
  organization: Peking University,China
– sequence: 2
  givenname: Cen
  surname: Zhang
  fullname: Zhang, Cen
  organization: Nanyang Technological University,Singapore
– sequence: 3
  givenname: Kailong
  surname: Wang
  fullname: Wang, Kailong
  organization: Huazhong University of Science and Technology,China
– sequence: 4
  givenname: Ling
  surname: Shi
  fullname: Shi, Ling
  organization: Nanyang Technological University,Singapore
– sequence: 5
  givenname: Yang
  surname: Liu
  fullname: Liu, Yang
  organization: Nanyang Technological University,Singapore
– sequence: 6
  givenname: Haoyu
  surname: Wang
  fullname: Wang, Haoyu
  organization: Huazhong University of Science and Technology,China
– sequence: 7
  givenname: Yao
  surname: Guo
  fullname: Guo, Yao
  organization: Peking University,China
– sequence: 8
  givenname: Ding
  surname: Li
  fullname: Li, Ding
  organization: Peking University,China
– sequence: 9
  givenname: Xiangqun
  surname: Chen
  fullname: Chen, Xiangqun
  organization: Peking University,China
BookMark eNotjr1OwzAURg0CiVI6szDkBVJ87fg6HquolEpBDMBc-eemtZS6KA5CfXuCYPnOmT6dW3aVTokYuwe-BKjUo0QDKPhyouKAF2xhtKkrzjWIqtaXbCawkiUoLW7YIufo-KQKAXDGmjc62jRGX67TwSZPodimEAfyY9HYvi9WyfbnHHPxHcdD0dphT9Om_Zed5OUUqM937LqzfabFP-fs42n93jyX7etm26za0k4dY-kxWAHCdeidUSJoQm-8tjy4TknnOsdl7dH_pnVdQCVAK0FaGI3Gg5Nz9vD3G4lo9znEox3OO-Aaq5oL-QPuuUyA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3691620.3695016
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400712487
EISSN 2643-1572
EndPage 442
ExternalDocumentID 10764802
Genre orig-research
GrantInformation_xml – fundername: National Science and Technology Major Project
  funderid: 10.13039/501100018537
– fundername: National Research Foundation
  funderid: 10.13039/501100001321
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a248t-c6da212bf6cb952d7e6c9c7a0dbf53bbfb038c6c2561ffd6521752e729769c1b3
IEDL.DBID RIE
IngestDate Wed Jan 15 06:20:43 EST 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-c6da212bf6cb952d7e6c9c7a0dbf53bbfb038c6c2561ffd6521752e729769c1b3
PageCount 13
ParticipantIDs ieee_primary_10764802
PublicationCentury 2000
PublicationDate 2024-Oct.-27
PublicationDateYYYYMMDD 2024-10-27
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-27
  day: 27
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib057256116
ssj0051577
Score 2.29955
Snippet In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow...
SourceID ieee
SourceType Publisher
StartPage 430
SubjectTerms Codes
Indirect-call analysis
Large language models
Limiting
LLM
Natural languages
Programming
Semantic analysis
Semantics
Software development management
Software engineering
Software maintenance
Static analysis
Title Semantic-Enhanced Indirect Call Analysis with Large Language Models
URI https://ieeexplore.ieee.org/document/10764802
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7ak6f6qPgmB69bd7N5dM-lpYoWQQu9lcwki2LdSt1e_PVO9qEiCF6WsKcwyeSbTGa-j7FLBzLXYCFymY0jKSCLLKTB3b3SDuIUK3b-u6mezOTNXM2bZvWqF8Z7XxWf-X4YVm_5boWbkCojDzdaDgJ15Dbts7pZq908yhB4JyHWqY9hwmljGi6fRKqrVFMgJOiOqjMVB3XzH2IqFZaMu2zazqIuIXnpb0ro48cvgsZ_T3OX9b7b9vj9FyDtsS1f7LNuq9vAGzc-YMMH_0oGfcZoVDxVFQD8uqixjQ_tcslbphIesrT8NhSL07dObPKgnrZ877HZePQ4nESNmEJkhRyUEWpnCaYg1wiZEs54jRkaGzvIVQqQ07oMUGOwYp47TbBulPAUexudYQLpIesUq8IfMS6Ns0CnqDGJk4m3IJzIhUQ0WmOK8THrBaMs3mq-jEVrj5M__p-yHUGhQkAEYc5Yp1xv_DlBfQkX1RJ_Ar04p94
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LS8NAEMcXqQc91UfFt3vwmpps9tGcS0urbRFsobeyM7tBsaaiycVP724eKoLgJYScltnM_iaTmf8Qcm2ApxI0BCbRYcAZJIGG2Lu7FdJAGGOpzj-dydGC3y7Fsm5WL3thrLVl8Znt-tvyX77ZYOFTZc7DleQ9Lx257cDPRdWu1bw-Qjl8Rz7aqQ5iR2qlajWfiIubWLpQiLmvVJmI0M83_zFOpaTJsE1mzTqqIpLnbpFDFz9-STT-e6F7pPPduEfvv5C0T7ZsdkDazeQGWjvyIek_2Bdn0icMBtljWQNAx1lFN9rX6zVttEqoz9PSiS8Xd9cqtUn9_LT1e4cshoN5fxTU4xQCzXgvD1Aa7UAFqURIBDPKSkxQ6dBAKmKA1O1MDyV6K6apkQ7sSjDrom8lE4wgPiKtbJPZY0K5MhrcOapUZHhkNTDDUsYRlZQYY3hCOt4oq9dKMWPV2OP0j-dXZGc0n05Wk_Hs7ozsMhc4eD4wdU5a-VthLxz4c7gst_sTuaurKw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=Semantic-Enhanced+Indirect+Call+Analysis+with+Large+Language+Models&rft.au=Cheng%2C+Baijun&rft.au=Zhang%2C+Cen&rft.au=Wang%2C+Kailong&rft.au=Shi%2C+Ling&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=430&rft.epage=442&rft_id=info:doi/10.1145%2F3691620.3695016&rft.externalDocID=10764802