Investigating the Transferability of Code Repair for Low-Resource Programming Languages (2406.14867v2)

Published 21 Jun 2024 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs have shown remarkable performance on code generation tasks. A recent use case is iterative code repair, where an LLM fixes an incorrect program by rationalizing about errors and generating new code. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation, but only study their benefits on high-resource languages like Python, and ignore low-resource languages like Perl. To address this gap of knowledge, we investigate the benefits of distilling code repair for both high and low resource languages to determine if the techniques that are effective in a high resource setting are also applicable in a low resource setting. Our evaluation shows that distilling the ability to repair code has language dependent benefits. To explain this behavior, we perform a further analysis and find that contrary to preexisting beliefs, the correlation between reasoning ability and code correction ability is weak. We hypothesize this weak correlation is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair.

Summary

The paper presents DistiLRR, a framework that distills repair capabilities from teacher models to enhance performance on low-resource languages.
It demonstrates significant improvements, with pass rate gains (up to 144.5% for Swift) and reduced syntax errors compared to baselines.
The study reveals that high-quality rationales alone are insufficient for correct repairs, highlighting the critical role of distillation for model responsiveness.

DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

The paper "DistiLRR: Transferring Code Repair for Low-Resource Programming Languages" addresses the performance disparities of LLMs on code repair tasks between high-resource programming languages (HRPLs) and low-resource programming languages (LRPLs). Traditional applications of code repair frameworks are predominantly evaluated on HRPLs such as Python, but understanding and improving their efficacy on LRPLs remains underexplored. This paper proposes the Distilling Low-Resource Repairs (DistiLRR) approach to bridge this gap by transferring the reasoning and code generation skills from a teacher model to a student model.

Methodology and Framework

Code Repair Framework: The DistiLRR methodology builds on a standard iterative code repair framework, integrating it with a distillation step where the student model learns from high-quality repairs generated by a teacher model. Specifically, the process involves:

Generating initial incorrect code samples via a base model.
Executing tests to produce error messages from these samples.
Using a repair model to generate rationales and modifications iteratively.

Distillation Process: The core innovation of DistiLRR lies in replacing the base model with a distillation-enhanced student model. The distillation process leverages high-quality rationales and repairs from a larger teacher model (GPT-3.5-Turbo) to train a smaller student model (CodeLlama-7b-Instruct, CodeLlama-7b, Mistral-7b). The dataset construction process involves generating incorrect code, obtaining error feedback, and collecting correct repairs from the teacher model.

Experimental Setup and Baselines

The authors conduct a comprehensive evaluation on three HRPLs (Python, JavaScript, Java) and three LRPLs (Perl, Golang, Swift) across two benchmarks (MBXP and MultiLingual HumanEval). They compare DistiLRR against several baselines including non-repair i.i.d. sampling, basic iterative repair with base models, in-context learning (ICL) where the rationale is provided by the teacher model but code is generated by the base model, and direct use of teacher model for repairs.

Key Findings

Initial vs. Repair Pass Rates: Four rounds of DistiLRR repair consistently outperform initial pass@10 and often pass@5 results, indicating its efficacy in achieving higher pass rates with fewer inference calls compared to non-repair sampling.
DistiLRR vs. Baselines: DistiLRR models achieve superior pass rates on LRPLs compared to ICL and base models. Specifically, DistiLRR improves pass@1 by 99.5% for Perl, 112.8% for Golang, and 144.5% for Swift on HumanEval.
Rationale Quality vs. Code Correctness: The paper reveals a weaker-than-expected correlation between rationale quality and subsequent repairs. Even with good rationales, base models often generate incorrect code, particularly in LRPLs. DistiLRR mitigates this by improving the responsiveness to rationale feedback.
Reduction in Syntax Errors: The DistiLRR models show a marked decrease in syntax errors on LRPLs, suggesting improved model understanding of the programming languages' nuances. For HRPLs, the difference in syntax error reduction is marginal, reflecting better pre-existing model knowledge.

Implications and Future Directions

Implications: The paper highlights the potential of distillation to enable more efficient and accurate code repair frameworks, especially for underrepresented LRPLs. By transferring knowledge from a teacher to a student model, DistiLRR not only enhances repair capabilities but does so without requiring extensive human-annotated datasets.

Future Research: Further investigations could explore scaling the fine-tuning datasets to assess the limits of DistiLRR's improvements. Additionally, evaluating the approach on more complex, reasoning-heavy code benchmarks and extending the distillation methodology to other domains within code generation and repair tasks could provide more general insights.

In sum, "DistiLRR: Transferring Code Repair for Low-Resource Programming Languages" contributes valuable insights into distillation-based approaches for improving LLM performance across diverse programming languages. This work paves the way for broader application and accessibility of high-quality code generation tools, especially benefiting languages with limited training data.

PDF Markdown

Related Papers

Tweets

https://twitter.com/AlfonAmayuelas/status/1807192154688631121

https://twitter.com/AlfonAmayuelas/status/1808541647552270603

https://twitter.com/AlfonAmayuelas/status/1917275047812804971