Emergent Mind

Abstract

Language model-based code completion models have quickly grown in use, helping thousands of developers write code in many different programming languages. However, research on code completion models typically focuses on imperative languages such as Python and JavaScript, which results in a lack of representation for functional programming languages. Consequently, these models often perform poorly on functional languages such as Haskell. To investigate whether this can be alleviated, we evaluate the performance of two language models for code, CodeGPT and UniXcoder, on the functional programming language Haskell. We fine-tune and evaluate the models on Haskell functions sourced from a publicly accessible Haskell dataset on HuggingFace. Additionally, we manually evaluate the models using our novel translated HumanEval dataset. Our automatic evaluation shows that knowledge of imperative programming languages in the pre-training of LLMs may not transfer well to functional languages, but that code completion on functional languages is feasible. Consequently, this shows the need for more high-quality Haskell datasets. A manual evaluation on HumanEval-Haskell indicates CodeGPT frequently generates empty predictions and extra comments, while UniXcoder more often produces incomplete or incorrect predictions. Finally, we release HumanEval-Haskell, along with the fine-tuned models and all code required to reproduce our experiments on GitHub (https://github.com/AISE-TUDelft/HaskellCCEval).

Pipeline demonstrating the methodology followed in the discussed arXiv research paper.

Overview

  • The study evaluates the efficacy of CodeGPT and UniXcoder, two language models, in completing Haskell code, a functional programming language, highlighting their successes and limitations.

  • It presents the research motivation stemming from the underrepresentation of functional languages like Haskell in automatic code completion studies and the potential benefits of understanding these models' performance.

  • The approach entails creating Haskell datasets, fine-tuning the language models on these, and evaluating their performance using specific metrics to assess how well they adapt to Haskell's syntax and semantics.

  • The results indicate improvements with fine-tuning but also underline Haskell's complexity for these models, suggesting the need for Haskell-specific training strategies for better performance.

Investigating the Performance of Language Models for Completing Code in Haskell

Introduction

The performance of language models, specifically CodeGPT and UniXcoder, in the arena of automatic code completion for Haskell, a strongly typed functional programming language, is explored. Unlike the wealth of research in code completion models for imperative languages such as Python and JavaScript, functional languages like Haskell have received less attention. This study aims to fill that gap by evaluating and fine-tuning CodeGPT and UniXcoder on Haskell code. Through the evaluation on a publicly available Haskell dataset and the specially translated HumanEval dataset into Haskell, the research scrutinizes the adaptability of these models to the syntactic and semantic constructs unique to Haskell.

Motivation

Functional programming languages, particularly Haskell, present unique challenges and opportunities for automatic code completion due to their concise syntax and advanced type class techniques. This study is motivated by the underrepresentation of functional languages in the existing research on code completion models. It aims to investigate whether the knowledge acquired by language models from imperative programming languages can be effectively transferred to a functional programming context. With an increasing integration of functional programming concepts into mainstream languages, understanding code completion in Haskell holds broader implications for improving language models' performance across a range of programming paradigms.

Approach

The research strategy involves three principal phases: dataset creation, fine-tuning, and evaluation. Initially, Haskell code samples are collected and processed from the Blastwind dataset available on HuggingFace, alongside the creation of a new dataset by translating Python functions from HumanEval to Haskell. These datasets serve to train and evaluate two language models, CodeGPT and UniXcoder, both pre-trained on multiple programming languages. The fine-tuning process adapts these models to perform line completion for Haskell, with evaluation metrics focusing on Exact Match (EM) and Edit Similarity (ES) to assess performance.

Results and Discussion

The fine-tuning significantly enhanced both models' abilities to complete Haskell code, with noteworthy improvement over their base versions. However, when compared to results on imperative languages, Haskell presents a more considerable challenge, as indicated by the lower performance metrics. This outcome underscores the essential difference in model adaptability between functional and imperative languages, suggesting a need for Haskell-specific adjustments or training strategies.

Manual evaluation on the translated HumanEval dataset reveals notable behavioral differences between CodeGPT and UniXcoder. CodeGPT tends to generate empty outputs or include unnecessary comments, while UniXcoder is prone to incomplete or incorrect predictions. Despite this, neither model demonstrates a consistent failure across specific Haskell features or constructs, indicating a general need for improvement in functional language support.

Implications

The findings highlight the potential and limitations of applying language models developed primarily for imperative languages to a functional programming context. For developers and researchers, understanding these dynamics is crucial for advancing code completion tools that support a wider range of programming languages, including Haskell. Future work should focus on developing and incorporating high-quality Haskell datasets into the training process of language models, potentially improving their performance not just in Haskell, but in understanding functional programming principles that are increasingly relevant in modern software development.

Concluding Remarks

This study illuminates the challenges and opportunities in extending language models' capabilities to Haskell code completion. While fine-tuning shows promise in enhancing model performance, the distinct nature of functional programming warrants specialized approaches for optimal code completion support. As the landscape of programming continues to evolve, with a greater fusion of imperative and functional paradigms, research such as this paves the way for more versatile and effective code completion tools that cater to the diverse needs of today's developers.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.