Emergent Mind

Agentless: Demystifying LLM-based Software Engineering Agents

(2407.01489)
Published Jul 1, 2024 in cs.SE , cs.AI , cs.CL , and cs.LG

Abstract

Recent advancements in LLMs have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run commands, observe feedback from the environment, and plan for future actions. However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents? To attempt to answer this question, we build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic two-phase process of localization followed by repair, without letting the LLM decide future actions or operate with complex tools. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance (27.33%) and lowest cost (\$0.34) compared with all existing open-source software agents! Furthermore, we manually classified the problems in SWE-bench Lite and found problems with exact ground truth patch or insufficient/misleading issue descriptions. As such, we construct SWE-bench Lite-S by excluding such problematic issues to perform more rigorous evaluation and comparison. Our work highlights the current overlooked potential of a simple, interpretable technique in autonomous software development. We hope Agentless will help reset the baseline, starting point, and horizon for autonomous software agents, and inspire future work along this crucial direction.

Agentless architecture overview.

Overview

  • The paper 'Agentless: Demystifying LLM-based Software Engineering Agents,' authored by Chunqiu Steven Xia et al., introduces a simplified agentless methodology for software development using a two-phase process of localization and repair.

  • The 'Agentless' approach demonstrates superior performance and cost efficiency compared to existing agent-based techniques, achieving a 27.33% success rate on the SWE-bench Lite benchmark.

  • The authors meticulously classify problems within the SWE-bench Lite dataset and propose a refined subset, SWE-bench Lite-$S$, to provide a more rigorous benchmark for evaluating future developments.

An Analysis of "Agentless: Demystifying LLM-based Software Engineering Agents"

The paper "Agentless: Demystifying LLM-based Software Engineering Agents," authored by Chunqiu Steven Xia et al., addresses the efficiencies and pitfalls of current LLM agent-based approaches within software engineering. The paper introduces a novel agentless methodology named "Agentless" aimed at solving software development problems via a two-phase process of localization and repair, eschewing the complexity of LLM-based autonomous agents.

Key Contributions

The central contribution of this work is the demonstration that Agentless, despite its simplicity, outperforms existing agent-based techniques on practical software development tasks. Among notable aspects:

  1. Simplified Process:

    • Localization Phase: Employs a hierarchical method to localize faults. This begins by identifying suspicious files, followed by narrowing down to relevant classes/functions, and finally pinpointing specific lines or sections needing edits.
    • Repair Phase: Utilizes LLMs to generate candidate patches in a straightforward diff format. These patches undergo syntax and regression tests filtering before final selection via majority voting.
  2. Performance Evaluation:

    • Benchmark Comparison: Evaluated on SWE-bench Lite, Agentless achieves a performance of 27.33% resolved issues, surpassing open-source agent-based approaches and achieving highly competitive cost efficiency.
    • Cost Efficiency: Average cost per issue is significantly lower than agent-based methods, showcasing the economic appeal of simpler frameworks in deploying LLMs for software tasks.
  3. Manual Problem Classification:

    • The authors conducted extensive manual classification of the SWE-bench Lite dataset, identifying issues such as problems with exact ground truth patches, misleading descriptions, and insufficient problem information.
    • Constructing a refined subset, SWE-bench Lite-$S$, aims to provide a cleaner, more rigorous benchmark for future developments.

Detailed Insights

Localization and Repair Process

Agentless meticulously structures the localization and repair process, ensuring efficient fault detection and correction:

  • Hierarchical Localization: By converting project codebases into a structured format, and successively narrowing down to edit-specific locales, Agentless cuts down on unnecessary computational overhead.
    • File-Level: Initial identification to isolate suspicious files.
    • Class/Function-Level: Skeleton extraction to filter through possibly expansive files.
    • Line-Level: Precision narrowing for direct fault edits.

Patch Generation and Filtering:

- Diff Format Generation: Adopts a search/replace diff format over entire code segments, reducing error rates and increasing the relevance of generated patches. - Filtering and Ranking: Applies syntax and regression tests, followed by a majority voting system to finalize the patch, ensuring that the most accurate and functional solution is chosen.

Efficiency and Comparative Analysis

Performance Metrics:

- The tool shows a 27.33% success rate on SWE-bench Lite when compared to other LLM-based agents used in the study. - High localization accuracy (77.7% accuracy at the file level and 50.8% at the line level) reduces inefficiencies inherent in broader LLM-based methods which might employ excessive localization cycles.

Cost and Token Efficiency:

- At an average cost of $0.34 per issue, Agentless manages to be extremely cost-effective. - Token usage was efficient, showcasing that the approach mitigates the expansive token consumption common in more complex models.

Benchmarks and Future Implications

SWE-bench Lite and SWE-bench Lite-$S$:

- The introduction of SWE-bench Lite-$S$ after filtering problematic issues presents a consolidated and rigorous approach to evaluating autonomous techniques. - Highlighted discrepancies in the original set underscore the necessity of well-annotated and accurately described benchmarks for fair performance comparisons.

Future Directions:

- Combining Agentless simplicity with some strategic advances from agent-based systems might enhance the approach further. - Improvements in hierarchical search methods and better self-reflection modules could be potential areas for enhancing LLM efficacy in software engineering tasks.

Conclusion

The "Agentless" paper consolidates the premise that simpler, well-structured approaches can be highly effective in practical software engineering. The two-phase method leveraging LLMs for localization and repair offers superior performance and cost-effectiveness compared to current complex agent-based systems. The insights gained through problem classification and the formulation of SWE-bench Lite-$S$ provide a robust foundation for future research in this area, reinforcing the potential for minimalistic tools to set new standards in autonomous software development.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube