Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives (2310.01152v2)

Published 2 Oct 2023 in cs.CR and cs.AI

Abstract: This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing LLMs such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet interesting findings: generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and refinement, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to yield a broad spectrum of vulnerabilities with the hope of encompassing the correct answer, whereas the goal of critic that evaluates the validity of identified vulnerabilities is to minimize the number of false positives. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield pronounced improvements over the conventional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.

References (59)

Citations (33)

View on Semantic Scholar

Summary

The paper presents a two-stage framework, GPTLens, that leverages separate generation and discrimination stages to enhance vulnerability detection in smart contracts.
It demonstrates a significant improvement in detection accuracy, nearly doubling the hit ratio from 38.5% to 76.9% by mitigating false positives and negatives.
The research highlights the practical implications of AI-driven smart contract auditing and sets the stage for advances in automated security verification.

LLM-Powered Smart Contract Vulnerability Detection: New Perspectives

The paper "LLM-Powered Smart Contract Vulnerability Detection: New Perspectives" presents a comprehensive analysis of leveraging LLMs, specifically models like GPT-4, to detect vulnerabilities in smart contracts. This research identifies the inherent challenges and opportunities in applying LLMs to the field of smart contract auditing and introduces a two-stage framework, named GPTLens, to enhance the effectiveness of this task.

Challenges and Observations

The authors delineate several challenges associated with using LLMs for vulnerability detection in smart contracts:

False Positives: LLMs tend to generate numerous false positives, which necessitate substantial manual verification, thereby lowering the practical utility of these models.
False Negatives: LLMs may fail to identify actual vulnerabilities, reducing recall rates. Some vulnerabilities go undetected due to the randomness inherent in the generative process.
Balancing Correctness and Generality: While traditional tools rely on expert-designed patterns that offer limited scope, LLMs have the potential to generalize beyond predefined vulnerabilities. However, achieving a balance between generating correct outputs and maintaining generality remains a challenge.

GPTLens Framework

GPTLens tackles the above challenges by employing a novel, two-stage adversarial framework:

Generation Stage: Here, the LLM functions as multiple 'auditor' agents. Each agent generates a variety of possible vulnerabilities, aiming for high diversity in output, to capture plausible vulnerabilities.
Discrimination Stage: In this subsequent stage, a 'critic' agent evaluates the generated vulnerabilities. It ranks each finding based on factors such as correctness, severity, and profitability. The critic’s role is to mitigate the false positives by discerning the most plausible vulnerabilities from the generated set.

Empirical Results

The empirical evaluation involved testing on 13 smart contracts, each documented with a known vulnerability in the CVE database. The paper compared several configurations, demonstrating that the proposed GPTLens framework results in a marked improvement in vulnerability detection:

The hit ratio for identifying vulnerabilities at the contract level increased significantly. Notably, GPTLens with multiple auditors outperformed a conventional one-stage detection by almost doubling the hit ratio from 38.5% to 76.9%.
Even when considering trial-level outputs (individual generation runs), the accuracy improved from 33.3% to 59.0%, highlighting the efficacy of the two-stage strategy.

Theoretical and Practical Implications

This research provides essential insights into the development of AI-driven tools in the domain of smart contract auditing. The GPTLens framework suggests a path to more reliable and efficient detection processes that do not strictly rely on expert-crafted rules or predefined vulnerability types. This capability could extend to detecting novel and uncategorized vulnerabilities, thereby enhancing the robustness of smart contract security.

Speculation on Future Developments

Continued innovation in this area may hinge on several key areas:

Enhanced Diversity in LLM Generation: Developing new mechanisms for enhancing diversity without increasing false positives could further improve detection rates.
Improved In-Context Learning: Teaching critics to maintain consistency across batches could address current limitations related to token constraints.
Integration with External Knowledge: Leveraging the ability of LLMs to interface with tools or databases might provide additional contextual knowledge during detection, potentially improving accuracy and reducing false positives.
Role of LLMs in Broader Software Development: The integration of LLMs in tasks ranging from code generation to automated vulnerability repair holds significant promise, possibly revolutionizing approaches to software development by incorporating AI agents as central elements.

In conclusion, the paper's findings represent a substantive contribution to smart contract vulnerability detection, illustrating the dual potential of LLMs to enhance both the breadth of coverage and accuracy of vulnerability detection systems.

PDF Markdown

Related Papers

GitHub

GitHub - git-disl/GPTLens: Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives (TPS23) (97 stars)

Tweets

https://twitter.com/urataps/status/1798699861820440627

https://twitter.com/BitBiblio/status/1746158871092072763