AutoCodeRover: Autonomous Program Improvement (2404.05427v3)
Abstract: Researchers have made significant progress in automating the software development process in the past decades. Recent progress in LLMs has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless, software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. bug fixing) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving GitHub issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM's understanding of the issue's root cause, and effectively retrieve a context via iterative search. The use of spectrum-based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on SWE-bench-lite (300 real-life GitHub issues) show increased efficacy in solving GitHub issues (19% on SWE-bench-lite), which is higher than the efficacy of the recently reported SWE-agent. In addition, AutoCodeRover achieved this efficacy with significantly lower cost (on average, $0.43 USD), compared to other baselines. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.
- 2022. GitHub Copilot, your AI pair programmer. https://github.com/features/copilot/
- On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007). IEEE, IEEE, 89–98. https://doi.org/10.1109/taic.part.2007.13
- Getafix: Learning to fix bugs automatically. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–27.
- Fuzzing: Challenges and Reflections. IEEE Software 38, 3 (2021).
- Cristian Cadar and Koushik Sen. 2013. Symbolic execution for software testing: three decades later. Commun. ACM 56, 2 (2013).
- Evaluating Large Language Models Trained on Code. arXiv.org abs/2107.03374 (7 2021). arXiv:2107.03374 https://arxiv.org/abs/2107.03374
- Automated Repair of Programs from Large Language Models.. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, IEEE, 1469–1481. https://doi.org/10.1109/icse48619.2023.00128
- Crash-avoiding program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 8–18.
- Automated program repair. Commun. ACM 62 (11 2019), 56–65. Issue 12.
- Nadeeshaan Gunasinghe and Nipuna Marcus. 2021. Language Server Protocol and Implementation. Springer.
- Impact of Code Language Models on Automated Program Repair., In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 (Melbourne, Victoria, Australia). International Conference on Software Engineering, 1430–1442. https://doi.org/10.1109/icse48619.2023.00125
- Leaderboard results on SWE-bench. Retrieved April 8, 2024 from https://www.swebench.com/
- SWE-bench: Can Language Models Resolve Real-world Github Issues?. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66
- Wuxia Jin et al. 2024. PyAnalyzer: An Effective and Practical Approach for Dependency Extraction from Python Code. In International Conference on Software Engineering (ICSE).
- Visualization of test information to assist fault localization. In Proceedings of the 24th international conference on Software engineering. 467–477.
- A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 114–125.
- Cognition Labs. 2024. Devin, AI software engineer. https://www.cognition-labs.com/introducing-devin.
- Competition-Level Code Generation with AlphaCode. Science abs/2203.07814, 6624 (12 2022), 1092–1097. https://doi.org/10.48550/arxiv.2203.07814
- CoCoNuT: combining context-aware neural translation models using ensemble for program repair., In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, Sarfraz Khurshid and Corina S. Pasareanu (Eds.). International Symposium on Software Testing and Analysis, 101–114.
- Sapfix: Automated end-to-end repair at scale. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 269–278.
- Angelix: scalable multiline program patch synthesis via symbolic analysis., In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, Laura K. Dillon, Willem Visser, and Laurie Williams (Eds.). International Conference on Software Engineering, 691–701.
- SemFix: program repair via semantic analysis., In 35th International Conference on Software Engineering, ICSE ’13, San Francisco, CA, USA, May 18-26, 2013 (San Francisco, CA, USA), David Notkin, Betty H. C. Cheng, and Klaus Pohl (Eds.). International Conference on Software Engineering, 772–781. https://doi.org/10.1109/icse.2013.6606623
- Trust Enhancement Issues in Program Repair. In IEEE/ACM 44th International Conference on Software Engineering (ICSE).
- Brayan Stiven Torrres Ovalle. 2023. GitHub Copilot. https://doi.org/10.26507/paper.2300
- Examining zero-shot vulnerability repair with large language models. In IEEE Symposium on Security and Privacy (SP).
- Okapi at TREC-3. NIST special publication 500225 (1995), 109–123.
- The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
- Barbara G Ryder. 1979. Constructing the call graph of a program. IEEE Transactions on Software Engineering 3 (1979), 216–226.
- Pycg: Practical call graph generation in python. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1646–1657.
- Is the cure worse than the disease? overfitting in automated program repair., In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015, Elisabetta Di Nitto, Mark Harman, and Patrick Heymans (Eds.). ESEC/SIGSOFT FSE, 532–543. http://people.cs.umass.edu/%7Ebrun/pubs/pubs/Smith15fse.pdf
- Anti-patterns in search-based program repair., In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, Thomas Zimmermann 0001, Jane Cleland-Huang, and Zhendong Su (Eds.). SIGSOFT FSE, 727–738.
- MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution. arXiv preprint arXiv:2403.17927 (2024).
- Automatically finding patches using genetic programming., In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings. 2009 IEEE 31st International Conference on Software Engineering, 364–374. https://doi.org/10.1109/icse.2009.5070536
- User-Centric Deployment of Automated Program Repair at Bloomberg. arXiv preprint arXiv:2311.10516 (2023).
- A survey on software fault localization. IEEE Transactions on Software Engineering (2016), 707–740. Issue 8.
- A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
- Automated program repair in the era of large pre-trained language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482–1494.
- SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models.
- A syntax-guided edit decoder for neural program repair., In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ESEC/SIGSOFT FSE, 341–353. https://arxiv.org/pdf/2106.08253