Comparing large language models and human programmers for generating programming code (2403.00894v2)

Published 1 Mar 2024 in cs.SE, cs.AI, cs.CL, and cs.PL

Abstract: We systematically evaluated the performance of seven LLMs in generating programming code using various prompt strategies, programming languages, and task difficulties. GPT-4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT-4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4 employing the optimal prompt strategy outperforms 85 percent of human participants. Additionally, GPT-4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT-4 is comparable to that of human programmers. These results suggest that GPT-4 has the potential to serve as a reliable assistant in programming code generation and software development.

References (23)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/manic_pixie_agi/status/1871294795894923634

https://twitter.com/Just4Think/status/1826357731126902827

Comparing large language models and human programmers for generating programming code (2403.00894v2)

Summary

Related Papers

Tweets