Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications (2403.16073v3)

Published 24 Mar 2024 in cs.SE

Abstract: Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that LLMs have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by iAudit achieved a consistency of about 38% compared to the ground truth causes.

References (70)

Citations (10)

View on Semantic Scholar

Summary

The paper presents a novel approach combining two-stage fine-tuning with LLM-based agents to both detect vulnerabilities and provide detailed justifications.
The methodology mimics expert human auditors by first identifying issues with a Detector and Reasoner, then refining causes with Ranker and Critic agents.
Empirical results show TrustLLM outperforms benchmarks with a 91.21% F1 score, demonstrating significant improvements in smart contract auditing precision.

Unified Fine-Tuning and LLM Agents for Intuitive Smart Contract Auditing with Justifications

Introduction to TrustLLM

TrustLLM represents a novel approach in the auditing of smart contracts, integrating fine-tuning techniques with LLMs to not only detect vulnerabilities in smart contracts but also provide justifications for the identified issues. Given the critical role of smart contracts in decentralized financial applications, ensuring their security is paramount. Traditional methods have shown limitations, especially with the emergent complex logical vulnerabilities. Recent advancements demonstrated LLMs' potential in this domain, yet precision remained a challenge. TrustLLM, through its innovative framework, aims to enhance detection precision and rationale clarity by emulating expert human auditors' intuitive and analytical processes.

Fine-Tuning and LLM-Based Agents Framework

TrustLLM employs a two-stage fine-tuning strategy, comprising the Detector and the Reasoner models, to initially decide on a vulnerability's presence and subsequently determine its cause. This approach mimics the human auditors' intuition followed by a detailed analysis, aiming to improve upon the unrefined precision of existing solutions.

Moreover, TrustLLM introduces LLM-based agents—Ranker and Critic—to refine the selection of vulnerability causes based on the Reasoner model's output. This iterative process enables a more accurate and defendable identification of smart contract vulnerabilities.

Empirical Evaluation

The evaluation of TrustLLM involved a comprehensive dataset assembly, contrasting its performance against both traditional fine-tuned models and prompt learning-based LLMs. The dataset featured balanced positive and negative samples, derived from reputable auditing reports and enhanced through a novel data augmentation method. TrustLLM outperformed benchmark models achieving an F1 score of 91.21\% and an accuracy of 91.11\%, with a consistency rate of about 38\% in aligning generated causes with the ground truth. This performance underscores TrustLLM’s enhanced capability in precise vulnerability detection and justification within the domain of Smart Contract auditing.

Ablation Studies and Consideration of Call Graph Information

Ablation studies justified the efficacy of the two-stage fine-tuning approach, highlighting the benefit of employing multiple prompts and majority voting in achieving superior model performance. Further examination revealed the nuanced impact of incorporating call graph information, suggesting potential benefits and pitfalls depending on its application context within the model's reasoning process.

Implications and Future Directions

TrustLLM's robust performance in detecting and justifying smart contract vulnerabilities offers significant implications for both theoretical research and practical application in the field of blockchain security. As the model demonstrates an ability to closely mirror expert human intuition and analytical rigor, it sets a promising foundation for future enhancements in automated auditing tools. Further exploration into optimizing the integration of contextual information and refining the iterative process amongst LLM-based agents could yield even higher precision and reliability in smart contract vulnerability auditing.

Conclusion

This paper introduced TrustLLM, a pioneering framework that significantly advances the auditing of smart contracts through a synergistic combination of fine-tuned models and LLM-based agents. By effectively addressing the limitations of existing LLM applications in this domain, TrustLLM not only enhances the precision of vulnerability detection but also provides cogent justifications, marking a notable contribution to the field of decentralized application security.

PDF Markdown

Tweets

https://twitter.com/ComputerPapers/status/1772600647893791071