Emergent Mind

DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

(2406.14867)
Published Jun 21, 2024 in cs.LG , cs.AI , and cs.CL

Abstract

LLMs have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework's efficacy is under-explored on low-resource languages. To apply code repair for low-resource languages, we propose Distilling Low-Resource Repairs (DistiLRR), an approach that transfers the reasoning and code generation ability from a teacher model to a student model. Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages. To investigate this behavior, we perform a further analysis and find that the correlation between rationale quality and code correctness is weaker than previously perceived. We hypothesize this weakness is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair between high-resource and low-resource languages.

Code repair framework: LLM generates and tests solutions, iterates repairs, stops after successful tests or set iterations.

Overview

  • The paper proposes DistiLRR, an approach using model distillation to enhance code repair for low-resource programming languages, leveraging high-quality repairs from a larger teacher model to train a smaller student model.

  • DistiLRR was evaluated across multiple programming languages and benchmarks, demonstrating significant improvements in code repair rates and reductions in syntax errors for low-resource languages compared to traditional methods.

  • The study underlines the potential of distillation techniques to facilitate efficient and accurate code repair without relying on extensive human-annotated datasets, suggesting future research paths for further optimization and broader application.

DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

The paper titled "DistiLRR: Transferring Code Repair for Low-Resource Programming Languages" addresses the performance disparities of LLMs on code repair tasks between high-resource programming languages (HRPLs) and low-resource programming languages (LRPLs). Traditional applications of code repair frameworks are predominantly evaluated on HRPLs such as Python, but understanding and improving their efficacy on LRPLs remains underexplored. This paper proposes the Distilling Low-Resource Repairs (DistiLRR) approach to bridge this gap by transferring the reasoning and code generation skills from a teacher model to a student model.

Methodology and Framework

Code Repair Framework: The DistiLRR methodology builds on a standard iterative code repair framework, integrating it with a distillation step where the student model learns from high-quality repairs generated by a teacher model. Specifically, the process involves:

  1. Generating initial incorrect code samples via a base model.
  2. Executing tests to produce error messages from these samples.
  3. Using a repair model to generate rationales and modifications iteratively.

Distillation Process: The core innovation of DistiLRR lies in replacing the base model with a distillation-enhanced student model. The distillation process leverages high-quality rationales and repairs from a larger teacher model (GPT-3.5-Turbo) to train a smaller student model (CodeLlama-7b-Instruct, CodeLlama-7b, Mistral-7b). The dataset construction process involves generating incorrect code, obtaining error feedback, and collecting correct repairs from the teacher model.

Experimental Setup and Baselines

The authors conduct a comprehensive evaluation on three HRPLs (Python, JavaScript, Java) and three LRPLs (Perl, Golang, Swift) across two benchmarks (MBXP and MultiLingual HumanEval). They compare DistiLRR against several baselines including non-repair i.i.d. sampling, basic iterative repair with base models, in-context learning (ICL) where the rationale is provided by the teacher model but code is generated by the base model, and direct use of teacher model for repairs.

Key Findings

  1. Initial vs. Repair Pass Rates: Four rounds of DistiLRR repair consistently outperform initial pass@10 and often pass@5 results, indicating its efficacy in achieving higher pass rates with fewer inference calls compared to non-repair sampling.
  2. DistiLRR vs. Baselines: DistiLRR models achieve superior pass rates on LRPLs compared to ICL and base models. Specifically, DistiLRR improves pass@1 by 99.5% for Perl, 112.8% for Golang, and 144.5% for Swift on HumanEval.
  3. Rationale Quality vs. Code Correctness: The study reveals a weaker-than-expected correlation between rationale quality and subsequent repairs. Even with good rationales, base models often generate incorrect code, particularly in LRPLs. DistiLRR mitigates this by improving the responsiveness to rationale feedback.
  4. Reduction in Syntax Errors: The DistiLRR models show a marked decrease in syntax errors on LRPLs, suggesting improved model understanding of the programming languages' nuances. For HRPLs, the difference in syntax error reduction is marginal, reflecting better pre-existing model knowledge.

Implications and Future Directions

Implications: The paper highlights the potential of distillation to enable more efficient and accurate code repair frameworks, especially for underrepresented LRPLs. By transferring knowledge from a teacher to a student model, DistiLRR not only enhances repair capabilities but does so without requiring extensive human-annotated datasets.

Future Research: Further investigations could explore scaling the fine-tuning datasets to assess the limits of DistiLRR's improvements. Additionally, evaluating the approach on more complex, reasoning-heavy code benchmarks and extending the distillation methodology to other domains within code generation and repair tasks could provide more general insights.

In sum, "DistiLRR: Transferring Code Repair for Low-Resource Programming Languages" contributes valuable insights into distillation-based approaches for improving LLM performance across diverse programming languages. This work paves the way for broader application and accessibility of high-quality code generation tools, especially benefiting languages with limited training data.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.