Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 173 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 124 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

AutoDev: Automated AI-Driven Development (2403.08299v1)

Published 13 Mar 2024 in cs.SE and cs.AI

Abstract: The landscape of software development has witnessed a paradigm shift with the advent of AI-powered assistants, exemplified by GitHub Copilot. However, existing solutions are not leveraging all the potential capabilities available in an IDE such as building, testing, executing code, git operations, etc. Therefore, they are constrained by their limited capabilities, primarily focusing on suggesting code snippets and file manipulation within a chat-based interface. To fill this gap, we present AutoDev, a fully automated AI-driven software development framework, designed for autonomous planning and execution of intricate software engineering tasks. AutoDev enables users to define complex software engineering objectives, which are assigned to AutoDev's autonomous AI Agents to achieve. These AI agents can perform diverse operations on a codebase, including file editing, retrieval, build processes, execution, testing, and git operations. They also have access to files, compiler output, build and testing logs, static analysis tools, and more. This enables the AI Agents to execute tasks in a fully automated manner with a comprehensive understanding of the contextual information required. Furthermore, AutoDev establishes a secure development environment by confining all operations within Docker containers. This framework incorporates guardrails to ensure user privacy and file security, allowing users to define specific permitted or restricted commands and operations within AutoDev. In our evaluation, we tested AutoDev on the HumanEval dataset, obtaining promising results with 91.5% and 87.8% of Pass@1 for code generation and test generation respectively, demonstrating its effectiveness in automating software engineering tasks while maintaining a secure and user-controlled development environment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Code generation on humaneval - state-of-the-art. https://paperswithcode.com/sota/code-generation-on-humaneval, 2024. Accessed: 2024-02-27.
  2. Github copilot: Your ai pair programmer. https://github.com/features/copilot, 2024.
  3. Copilot evaluation harness: Evaluating llm-guided software programming, 2024.
  4. Vscuda: Llm based cuda extension for visual studio code. In Proceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (New York, NY, USA, 2023), SC-W ’23, Association for Computing Machinery, p. 11–17.
  5. Evaluating large language models trained on code.
  6. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion. Advances in Neural Information Processing Systems 36 (2024).
  7. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
  8. Gravitas, S. Autogpt. https://github.com/Significant-Gravitas/AutoGPT, 2024. GitHub repository.
  9. Codexglue: A machine learning benchmark dataset for code understanding and generation, 2021.
  10. In-ide generation-based information support with a large language model, 2023.
  11. Gpt-4 technical report, 2024.
  12. OpenAI. Gpt 3.5 models, 2023.
  13. OpenAI. Gpt-4 technical report, 2023.
  14. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  15. Scaling language models: Methods, analysis & insights from training gopher, 2022.
  16. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
  17. Reflexion: Language agents with verbal reinforcement learning, 2023.
  18. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model, 2022.
  19. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2023.
  20. Attention is all you need. Advances in neural information processing systems 30 (2017).
  21. Glue: A multi-task benchmark and analysis platform for natural language understanding, 2019.
  22. Autogen: Enabling next-gen llm applications via multi-agent conversation, 2023.
  23. Language agent tree search unifies reasoning acting and planning in language models, 2023.
Citations (7)

Summary

  • The paper introduces AutoDev, a novel framework that enables fully automated planning and execution of software development tasks beyond basic code suggestions.
  • It employs a multi-agent architecture with a Conversation Manager, Agent Scheduler, and secure Docker-based Tools Library to enhance task efficiency.
  • Empirical results show impressive performance, with a Pass@1 of 91.5% for code generation and 87.8% for test generation on the HumanEval dataset.

AutoDev: Automated AI-Driven Development

AutoDev introduces a novel framework designed to address the limitations of existing AI-powered software development tools. These conventional solutions, such as GitHub Copilot, focus predominantly on code suggestion and manipulation within a constrained context. In contrast, AutoDev facilitates fully automated planning and execution of software engineering tasks, extending capabilities beyond code suggestion to encompass a broad range of repository operations.

Framework Overview

AutoDev releases the potential of AI Agents to autonomously perform software development tasks. The primary components include the Conversation Manager, Agent Scheduler, Tools Library, and Evaluation Environment. Figure 1

Figure 1: Overview of the AutoDev Framework: AI agents collaborate via a Conversation Manager, using the Tools Library to execute commands in a secure Docker environment.

AutoDev's Conversations Manager maintains ongoing dialog and interactions, including initializing context, validating actions, and concluding processes. The robust configuration settings define permissible operations, agent capabilities, and task objectives. This architecture supports multi-agent collaboration within defined rules, enhancing task execution efficiency.

The Agent Scheduler coordinates AI agents, applying approaches such as round-robin and token-based collaboration to allocate tasks efficiently. Individual agents utilize LLMs to process tasks, drawing commands from the versatile Tools Library, which abstracts complex software operations into simplified command sets.

The Evaluation Environment enables secure execution of actions through Docker containers, ensuring system integrity during code modification, testing, and execution.

Empirical Evaluation

The empirical evaluation of AutoDev against the HumanEval dataset underscores its effectiveness. AutoDev achieves a Pass@1 of 91.5% for code generation, demonstrating improved performance over the baseline GPT-4 model, which yielded 67%. Figure 2

Figure 2: AutoDev enables an AI Agent to achieve a given objective by performing several actions within the repository.

In test generation, AutoDev attains a Pass@1 of 87.8%, producing tests with coverage comparable to human-written results. These metrics endorse AutoDev's superior contextual understanding and execution capabilities, executing tasks within a fraction of the tokens expended by baseline executions.

Command Efficiency

AutoDev's typical task execution, reflected in Figure 3, involves approximately 5.5 to 6.5 commands, including write, test, and retrieval operations. Although AutoDev requires more inference calls than single execution approaches, it effectively integrates these with validation processes that would typically require manual developer intervention. Figure 3

Figure 3: Cumulative number of commands used by AutoDev for an average task of Code and Test Generation.

Discussion and Integration

AutoDev's design facilitates interaction between developers and AI agents within a collaborative framework. The use of multi-agent setups, as evidenced in exploratory investigations, further improves task execution accuracy and adaptability, particularly for complex or evolving software challenges.

Plans to incorporate AutoDev within IDEs and CI/CD workflows promise to streamline the development pipeline, allowing seamless integration into existing development practices. This strategy aligns with future expansions to handle more comprehensive task sets, leveraging enhanced model architectures and infrastructures.

Conclusion

AutoDev marks an incremental advancement in AI-driven development, propelling capabilities from rudimentary code suggestions to comprehensive task fulfillment within a secure environment. Its autonomous architecture redefines the role of AI in software development, embedding efficiency, security, and a broader scope of intelligence into day-to-day engineering workflows. Integrating AutoDev across diverse deployment platforms will further cement its utility and transformative potential in software development.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 53 tweets and received 35 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

Reddit Logo Streamline Icon: https://streamlinehq.com