OpenHands: An Open Platform for AI Software Developers as Generalist Agents (2407.16741v3)

Published 23 Jul 2024 in cs.SE, cs.AI, and cs.CL

Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in LLMs, there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.

Citations (41)

View on Semantic Scholar

Summary

The paper presents a novel platform that enables AI agents to autonomously develop software by writing code, engaging with CLIs, and browsing the web.
Its architecture integrates a secure, sandboxed environment with multi-agent collaboration and a comprehensive evaluation framework across 13 benchmarks.
The platform’s open-source MIT license and robust agent skills library promote community-driven innovation for both generalist and specialist AI agents.

OpenDevin: A Platform for AI Software Development

The paper "OpenDevin: An Open Platform for AI Software Developers as Generalist Agents" presents a new platform, OpenDevin, which aims to develop flexible AI agents capable of interacting with the environment similarly to human developers. This involves writing code, engaging with a command line interface, and browsing the web.

Introduction and Motivation

The paper begins by highlighting the growing importance and capabilities of LLMs in performing complex tasks such as code generation, web browsing, and software development. OpenDevin proposes an inclusive platform to advance and evaluate these AI agents in various real-world tasks. The platform features include implementing new agents, executing code in a safe, sandboxed environment, enabling coordination between multiple agents, and integrating evaluation benchmarks.

Architectural Components

OpenDevin’s architecture exemplifies a versatile design for building and evaluating AI agents. The platform includes five main elements:

Interaction Mechanism: It allows user interfaces, agents, and environments to interact through a flexible event stream architecture.
Sandboxed Environment: It offers a secure environment to safely execute code and system commands, using sandboxed operating systems and web browsers.
Agent Skills Interface: Agents can create complex software, execute code, and browse websites for information retrieval.
Multi-agent Collaboration: Multiple agents can collaborate, delegating and receiving tasks from one another.
Evaluation Framework: A comprehensive evaluation system to benchmark AI agents on 13 challenging tasks across various categories such as software engineering and web browsing.

Agent Implementation and Skills

The platform provides a robust agent abstraction layer, where the state, actions, and observations are clearly defined. A noteworthy component is the integration of various "agent skills" encapsulated in the AgentSkills library. The library encompasses utilities that enhance an agent's capabilities like file editing, image parsing, and interaction with IPython notebooks, among others. This design ensures easy creation, extension, and rigorous testing of agent functionality.

Moreover, OpenDevin supports creating both generalist and specialist agents, providing a multi-agent coordination framework. For instance, a generalist CodeActAgent can delegate specific sub-tasks such as web browsing to a more specialized BrowsingAgent.

Evaluation Benchmarks

OpenDevin includes an extensive collection of benchmarks to systematically assess the capabilities of the agents. The evaluations encompass:

Software Engineering: Benchmarks like SWE-Bench and HumanEvalFix challenge the agents to resolve real-world software development issues and bugs.
Web Browsing: Benchmarks such as WebArena and MiniWoB++ test the agents' navigation and interaction skills on various web interfaces.
Miscellaneous Assistance: Tasks like GAIA and GPQA evaluate the agents on real-world problem-solving primitives outside pure coding or browsing domains.

The results reveal that OpenDevin agents perform competitively across a broad spectrum of tasks, demonstrating their versatility and the platform's capability to foster generalist agent development.

Practical and Theoretical Implications

Practically, OpenDevin aims to facilitate the creation of robust, safe, and deployable AI agents that can enhance productivity in software engineering and web-based tasks. The inclusion of a permissive MIT license underlines its commitment to open-source development and community contribution.

Theoretically, OpenDevin challenges the current paradigms within AI research by promoting the development of adaptable and generalist agents capable of a wide variety of tasks. The platform's comprehensive evaluation suite serves as a standard metric for gauging progress in the AI community.

Future Work

Despite its robust design, OpenDevin recognizes areas for improvement, such as enhanced multi-modality support, more robust agent implementations, improved web browsing capabilities through advanced retry-on-error strategies, and a stable runtime environment. Future directions include automatic workflow generation via graph-based frameworks and a firmer focus on security and safety research within AI agents.

Conclusion

OpenDevin represents a pivotal step forward in developing versatile AI agents capable of handling complex, multi-faceted tasks. Its community-driven development model, combined with extensive benchmarking and a comprehensive skill set, positions it as a cornerstone for future innovations in AI agent technologies. The platform's potential implications in both practical applications and theoretical advancements underline its significance in the field of AI research and development.

In conclusion, OpenDevin is an integrative platform designed to push the boundaries of AI agent capabilities, providing a structured environment for advancing the frontiers of autonomous software developers and generalist agents alike.

PDF Markdown

Related Papers

Tweets

https://twitter.com/omarsar0/status/1816872317286281688

https://twitter.com/_akhaliq/status/1816292408692187423

https://twitter.com/gneubig/status/1828097484599759349

https://twitter.com/Muennighoff/status/1816531548155662538

https://twitter.com/fly51fly/status/1816953422542999912

https://twitter.com/AdeenaY8/status/1816387530192764976