Emergent Mind

Evaluation of ChatGPT Usability as A Code Generation Tool

(2402.03130)
Published Feb 5, 2024 in cs.SE and cs.AI

Abstract

With the rapid advance of ML technology, LLMs are increasingly explored as an intelligent tool to generate program code from natural language specifications. However, existing evaluations of LLMs have focused on their capabilities in comparison with humans. It is desirable to evaluate their usability when deciding on whether to use a LLM in software production. This paper proposes a user centric method. It includes metadata in the test cases of a benchmark to describe their usages, conducts testing in a multi-attempt process that mimic the uses of LLMs, measures LLM generated solutions on a set of quality attributes that reflect usability, and evaluates the performance based on user experiences in the uses of LLMs as a tool. The paper reports an application of the method in the evaluation of ChatGPT usability as a code generation tool for the R programming language. Our experiments demonstrated that ChatGPT is highly useful for generating R program code although it may fail on hard programming tasks. The user experiences are good with overall average number of attempts being 1.61 and the average time of completion being 47.02 seconds. Our experiments also found that the weakest aspect of usability is conciseness, which has a score of 3.80 out of 5. Our experiment also shows that it is hard for human developers to learn from experiences to improve the skill of using ChatGPT to generate code.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.