Towards AI-Assisted Synthesis of Verified Dafny Methods (2402.00247v2)

Published 1 Feb 2024 in cs.SE and cs.PL

Abstract: LLMs show great promise in many domains, including programming. A promise is easy to make but hard to keep, and LLMs often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs' specifications as well as code so that the code can be proved correct with respect to the specifications. Unfortunately, existing LLMs show a severe lack of proficiency in verified programming. In this paper, we demonstrate how to improve two pretrained models' proficiency in the Dafny verification-aware language. Using 178 problems from the MBPP dataset, we prompt two contemporary models (GPT-4 and PaLM-2) to synthesize Dafny methods. We use three different types of prompts: a direct Contextless prompt; a Signature prompt that includes a method signature and test cases, and a Chain of Thought (CoT) prompt that decomposes the problem into steps and includes retrieval augmentation generated example problems and solutions. Our results show that GPT-4 performs better than PaLM-2 on these tasks and that both models perform best with the retrieval augmentation generated CoT prompt. GPT-4 was able to generate verified, human-evaluated, Dafny methods for 58% of the problems, however, GPT-4 managed only 19% of the problems with the Contextless prompt, and even fewer (10%) for the Signature prompt. We are thus able to contribute 153 verified Dafny solutions to MBPP problems, 50 that we wrote manually, and 103 synthesized by GPT-4. Our results demonstrate that the benefits of formal program verification are now within reach of code generating LLMs...

Citations (17)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/kc_srk/status/1754157378457764344

https://twitter.com/MRHMisu/status/1753572598712307917

https://twitter.com/saikatch107/status/1781759611156222145

https://twitter.com/jameskjx/status/1758978030725378417

https://twitter.com/realmofresearch/status/1802748215537938723

Towards AI-Assisted Synthesis of Verified Dafny Methods (2402.00247v2)

Summary

Related Papers

Tweets