Large Language Models for Software Engineering: A Systematic Literature Review (2308.10620v6)

Published 21 Aug 2023 in cs.SE and cs.AI

Abstract: LLMs have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We select and analyze 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and flagging promising areas for future study. Our artifacts are publicly available at https://github.com/xinyi-hou/LLM4SE_SLR.

Citations (220)

View on Semantic Scholar

Summary

The paper categorizes LLM architectures across 395 studies and shows that decoder-only models excel in code generation tasks.
It details data handling techniques such as tokenization and normalization that boost LLM performance in various SE tasks.
It discusses optimization methods like PEFT and prompt engineering while outlining challenges and future research directions in SE.

Overview of "LLMs for Software Engineering: A Systematic Literature Review"

The integration of LLMs into Software Engineering (SE) has emerged as a compelling domain of research and practice, as systematically reviewed by Hou et al. in their comprehensive paper. The paper delineates the role, efficiency, and application of LLMs in SE, backed by an analysis of 395 papers spanning the years from 2017 to 2024. The research addresses four pivotal questions related to LLMs in SE, offering a broad yet detailed inquiry into this intersection.

Categorization and Utilization of LLMs

The paper categorizes LLMs based on their architectures into encoder-only, encoder-decoder, and decoder-only models. Encoder-only models, such as BERT and its derivatives, primarily exhibit capabilities in code understanding tasks. Encoder-decoder models, like T5, are suited for bi-directional translation tasks due to their versatile architecture. Decoder-only models, including variations like GPT-3 and Codex, excel in code generation tasks, providing syntactically and contextually rich outputs. The text highlights that decoder-only LLMs are predominantly leveraged in SE, underscoring the prominent application of these models in code generation due to their robust autoregressive properties.

Data Handling in LLM4SE

Data plays a pivotal role in the deployment of LLMs, with the review indicating a prevalent reliance on text-based and source code datasets from diverse sources such as open-source repositories, collected datasets from platforms like GitHub, and even industrial datasets. The preprocessing of these datasets involves tokenization, normalization, and representation which are crucial for the efficient training and functioning of LLMs. This aspect of data management is crucial, as well-processed datasets significantly enhance the model's understanding and performance.

Optimization and Evaluation Techniques

To optimize the performance of LLMs in SE, the literature uses techniques such as Parameter Efficient Fine-Tuning (PEFT), including methods like LoRA and prompt tuning. Prompt engineering emerges as a significant method, with approaches such as few-shot and zero-shot prompting enhancing the usability of LLMs without complete retraining. Evaluation metrics are tailored to specific SE tasks, with BLEU, CodeBLEU, and Exact Match being commonly referenced for code generation evaluations. These methodological practices underscore a growing complexity in adapting LLMs for SE-specific challenges.

Applications and Impact on SE Tasks

The application of LLMs spans various SE tasks, with significant utilization in activities related to software development and maintenance, such as code generation, code summarization, and bug fixing. The review identifies a substantial impact of LLMs in accelerating development processes and improving code quality through automation in repetitive tasks. However, the deployment in other phases like requirements engineering and software design remains nascent, indicating potential areas for further exploration.

Challenges and Future Directions

Despite the evident benefits, challenges persist, notably concerning the size and computational demands of LLMs, data dependencies, and model generalization. There is also an ongoing need for improving interpretability and trustworthiness in LLM-generated outputs to foster adoption in industry practices. Such challenges call for continued research into more efficient model architectures and innovative training approaches.

Conclusion

Overall, the paper by Hou et al. provides an insightful survey into the integration of LLMs in the field of SE, drawing attention to current applications, elucidating practical challenges, and outlining future research pathways. This systematic review not only emphasizes the transformative potential of LLMs in SE but also highlights the extensive groundwork necessary to harness these capabilities fully. With the progress in this field, the paper serves as a foundational reference for practitioners and researchers aiming to advance the use and efficacy of LLMs in SE.

PDF Markdown

Related Papers

GitHub

GitHub - xinyi-hou/LLM4SE_SLR: Large Language Models for Software Engineering: A Systematic Literature Review (18 stars)

Tweets

https://twitter.com/ComputerPapers/status/1767606211703169288

YouTube

Show All Videos