Emergent Mind

Abstract

Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that LLMs have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose TrustLLM, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, TrustLLM is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, TrustLLM employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate TrustLLM, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune TrustLLM. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, TrustLLM achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by TrustLLM achieved a consistency of about 38% compared to the ground truth causes.

TrustLLM is outlined, showing its components: Detector, Reasoner, Ranker, and Critic.

Overview

  • TrustLLM integrates fine-tuning techniques with LLMs for improved smart contract auditing, focusing on vulnerability detection and providing justifications.

  • It employs a two-stage fine-tuning approach, including Detector and Reasoner models, followed by LLM-based agents, Ranker and Critic, to refine vulnerability identification.

  • Empirical evaluation shows TrustLLM outperforms existing models in precision and justification of vulnerabilities, achieving an F1 score of 91.21% and accuracy of 91.11%.

  • The paper highlights the potential for future enhancements in automated auditing tools through further optimization and integration of contextual information.

Unified Fine-Tuning and LLM Agents for Intuitive Smart Contract Auditing with Justifications

Introduction to TrustLLM

TrustLLM represents a novel approach in the auditing of smart contracts, integrating fine-tuning techniques with LLMs to not only detect vulnerabilities in smart contracts but also provide justifications for the identified issues. Given the critical role of smart contracts in decentralized financial applications, ensuring their security is paramount. Traditional methods have shown limitations, especially with the emergent complex logical vulnerabilities. Recent advancements demonstrated LLMs' potential in this domain, yet precision remained a challenge. TrustLLM, through its innovative framework, aims to enhance detection precision and rationale clarity by emulating expert human auditors' intuitive and analytical processes.

Fine-Tuning and LLM-Based Agents Framework

TrustLLM employs a two-stage fine-tuning strategy, comprising the Detector and the Reasoner models, to initially decide on a vulnerability's presence and subsequently determine its cause. This approach mimics the human auditors' intuition followed by a detailed analysis, aiming to improve upon the unrefined precision of existing solutions.

Moreover, TrustLLM introduces LLM-based agents—Ranker and Critic—to refine the selection of vulnerability causes based on the Reasoner model's output. This iterative process enables a more accurate and defendable identification of smart contract vulnerabilities.

Empirical Evaluation

The evaluation of TrustLLM involved a comprehensive dataset assembly, contrasting its performance against both traditional fine-tuned models and prompt learning-based LLMs. The dataset featured balanced positive and negative samples, derived from reputable auditing reports and enhanced through a novel data augmentation method. TrustLLM outperformed benchmark models achieving an F1 score of 91.21\% and an accuracy of 91.11\%, with a consistency rate of about 38\% in aligning generated causes with the ground truth. This performance underscores TrustLLM’s enhanced capability in precise vulnerability detection and justification within the domain of Smart Contract auditing.

Ablation Studies and Consideration of Call Graph Information

Ablation studies justified the efficacy of the two-stage fine-tuning approach, highlighting the benefit of employing multiple prompts and majority voting in achieving superior model performance. Further examination revealed the nuanced impact of incorporating call graph information, suggesting potential benefits and pitfalls depending on its application context within the model's reasoning process.

Implications and Future Directions

TrustLLM's robust performance in detecting and justifying smart contract vulnerabilities offers significant implications for both theoretical research and practical application in the realm of blockchain security. As the model demonstrates an ability to closely mirror expert human intuition and analytical rigor, it sets a promising foundation for future enhancements in automated auditing tools. Further exploration into optimizing the integration of contextual information and refining the iterative process amongst LLM-based agents could yield even higher precision and reliability in smart contract vulnerability auditing.

Conclusion

This paper introduced TrustLLM, a pioneering framework that significantly advances the auditing of smart contracts through a synergistic combination of fine-tuned models and LLM-based agents. By effectively addressing the limitations of existing LLM applications in this domain, TrustLLM not only enhances the precision of vulnerability detection but also provides cogent justifications, marking a notable contribution to the field of decentralized application security.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.