Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs (2404.14719v1)

Published 23 Apr 2024 in cs.CR

Abstract: Currently, deep learning successfully applies to code vulnerability detection by learning from code sequences or property graphs. However, sequence-based methods often overlook essential code attributes such as syntax, control flow, and data dependencies, whereas graph-based approaches might underestimate the semantics of code and face challenges in capturing long-distance contextual information. To address this gap, we propose Vul-LMGNN, a unified model that combines pre-trained code LLMs with code property graphs for code vulnerability detection. Vul-LMGNN constructs a code property graph that integrates various code attributes (including syntax, flow control, and data dependencies) into a unified graph structure, thereafter leveraging pre-trained code model to extract local semantic features as node embeddings in the code property graph. Furthermore, to effectively retain dependency information among various attributes, we introduce a gated code Graph Neural Network (GNN). By jointly training the code LLM and the gated code GNN modules in Vul-LMGNN, our proposed method efficiently leverages the strengths of both mechanisms. Finally, we utilize a pre-trained CodeBERT as an auxiliary classifier, with the final detection results derived by learning the linear interpolation of Vul-LMGNN and CodeBERT. The proposed method, evaluated across four real-world vulnerability datasets, demonstrated superior performance compared to six state-of-the-art approaches. Our source code could be accessed via the link: https://github.com/Vul-LMGNN/vul-LMGGNN.

Authors (7)

Ruitong Liu (4 papers)
Yanbin Wang (26 papers)
Haitao Xu (42 papers)
Bin Liu (441 papers)
Jianguo Sun (12 papers)
Zhenhao Guo (5 papers)
Wenrui Ma (6 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Source Code Vulnerability Detection: Combining Code Language Models and Code Property Graphs (2404.14719v1)

Summary

Related Papers