Prompt-based Code Completion via Multi-Retrieval Augmented Generation (2405.07530v1)

Published 13 May 2024 in cs.SE

Abstract: Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained LLMs. However, these models often suffer from coherence issues and hallucinations when dealing with complex code logic or extrapolating beyond their training data. Existing Retrieval Augmented Generation (RAG) techniques partially address these issues by retrieving relevant code with a separate encoding model where the retrieved snippet serves as contextual reference for code completion. However, their retrieval scope is subject to a singular perspective defined by the encoding model, which largely overlooks the complexity and diversity inherent in code semantics. To address this limitation, we propose ProCC, a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm to flexibly incorporate and adapt to multiple perspectives of code. ProCC first employs a prompt-based multi-retriever system which crafts prompt templates to elicit LLM knowledge to understand code semantics with multiple retrieval perspectives. Then, it adopts the adaptive retrieval selection algorithm to incorporate code similarity into the decision-making process to determine the most suitable retrieval perspective for the LLM to complete the code. Experimental results demonstrate that ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite and 10.1% on the private-domain benchmark suite collected from a billion-user e-commerce company in terms of Exact Match. ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model.

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/_reachsumit/status/1790245164445020457

https://twitter.com/realmofresearch/status/1791500571297685958

https://twitter.com/gm8xx8/status/1790364899542986949

https://twitter.com/ComputerPapers/status/1790339741935628596

Prompt-based Code Completion via Multi-Retrieval Augmented Generation (2405.07530v1)

Summary

Related Papers

Tweets