AIDetectVul: Software Vulnerability Detection Method Based on Feature Fusion of Pre-trained Models

Data-driven deep learning models are constrained by the scale and diversity of training data, making them vulnerable to data bias. While large language models (LLMs) exhibit superior generalization in vulnerability detection, their low inference efficiency and high computational costs hinder practic...

Full description

Saved in:
Bibliographic Details
Published in2025 5th International Conference on Consumer Electronics and Computer Engineering (ICCECE) pp. 258 - 263
Main Authors Xue, Shiying, Li, Lin, Li, Tao, Chen, Haodong, Li, Jiapan, Qin, Yangqing
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.02.2025
Subjects
Online AccessGet full text
DOI10.1109/ICCECE65250.2025.10985370

Cover

Loading…
More Information
Summary:Data-driven deep learning models are constrained by the scale and diversity of training data, making them vulnerable to data bias. While large language models (LLMs) exhibit superior generalization in vulnerability detection, their low inference efficiency and high computational costs hinder practical deployment in industrial settings. To address these limitations, we propose AIDetectVul, a novel vulnerability detection framework leveraging feature fusion from pre-trained models. Our approach concurrently utilizes encoder-only and decoder-only architectures to extract complementary code embeddings, with feature fusion enhancing semantic diversity. These enriched representations are then processed by a Transformer model, where the self-attention mechanism effectively captures long-range code dependencies, ultimately improving both detection accuracy and generalization capability. Comprehensive evaluations on proprietary enterprise datasets and open-source benchmarks demonstrate that AIDetectVul achieves comparable detection accuracy to the state-of-the-art LineVul model while demonstrating measurable improvements in generalization performance. Compared to LLM-based approaches, our solution maintains significantly lower computational overhead and training costs, making it particularly suitable for industrial applications.
DOI:10.1109/ICCECE65250.2025.10985370