Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification
(2405.15414)Abstract
Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self-improvement in solving the task. In this work, we introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks. Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification inspired by human design practices: (1) visual verification of 3D structural speculates, which comes from agent synthesized CAD modeling programs; (2) pragmatic verification of the creation by generating and verifying environment-relevant functionality programs based on the abstract criteria. Extensive multi-dimensional human studies and Elo ratings show that the Luban completes diverse creative building tasks in our proposed benchmark and outperforms other baselines ($33\%$ to $100\%$) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world.
Overview
-
Luban is an AI agent designed to execute creative building tasks in Minecraft using techniques inspired by human design practices and autonomous verification methods.
-
It employs a two-stage process: 3D Structural Speculation with Visual Verification and Construction with Pragmatic Verification to ensure both visual appeal and functionality of the structures.
-
Impressive results were achieved in Minecraft tasks, with Luban receiving high quality ratings, achieving high Elo ratings, and a 100% pass rate in pragmatic verification.
Luban: An AI Agent for Creative Minecraft Building Tasks
Introduction
Have you ever wondered if AI could handle creative tasks that lack clear-cut goals? Well, that's precisely what the Luban agent addresses. Traditional AI agents thrive on tasks with well-defined objectives, like mining diamonds in Minecraft. But when it comes to more inventive tasks, such as building a structurally sound and visually appealing house, the goals are often abstract. Luban is designed to bridge this gap by introducing autonomous embodied verification techniques inspired by human design practices.
Overview of Luban's Approach
Luban is an AI agent designed to perform creative building tasks in Minecraft without predefined goals. It does this through two stages of autonomous embodied verification:
- 3D Structural Speculation with Visual Verification: This stage uses CAD programming to create 3D models based on initial task instructions. The visual aspects of these models are then verified using Visual Language Models (VLMs).
- Construction with Pragmatic Verification: Once the structure is visually verified, the next step involves constructing it within the Minecraft environment. After the construction, the agent verifies the functionality of the structure, such as ensuring that doors open properly or that a bridge is walkable.
Key Features
3D Structural Speculation
The first stage involves:
- Decomposing: Breaking down the task into smaller, manageable subcomponents.
- Subcomponent Generation: Converting these subcomponents into 3D CAD models.
- Assembling: Putting the subcomponents together into a complete 3D object.
Visual Verification
Multiple CAD models are generated and evaluated to filter out inappropriate designs. This ensures that only the most accurate models proceed to the next stage.
Pragmatic Verification
Here, the focus is on ensuring that the structure is functional within the Minecraft environment. For example, it verifies if players can enter the house through doors or cross a bridge without falling off.
Significant Numerical Results
Luban was tested on five different Minecraft tasks: arrow tower, bridge, Chinese ancient house, stair, and two-story house. The results were impressive:
- Quality Ratings: Luban received high ratings across various dimensions, such as Appearance, Complexity, and Aesthetics, outperforming other baselines.
- Elo Ratings: In one-to-one comparisons against other methods, Luban achieved an almost perfect winning rate with an Elo rating significantly higher than others.
- Pragmatic Verification Pass Rates: Luban had a 100% pass rate in pragmatic verification, indicating that its creations were highly functional.
Implications
Practical Implications
Luban's techniques could revolutionize the way AI handles creative tasks not just in digital environments like Minecraft but also in real-world applications. For example:
- Architecture: Imagine AI autonomously designing and verifying the functionality of complex structures.
- Robotics: Luban's methodologies could be adapted for use in robotics, enabling more intelligent and adaptive machines.
Theoretical Implications
From a theoretical standpoint, Luban’s autonomous verification approach contributes significantly to the development of more advanced AI. It opens the door to creating agents capable of handling abstract, open-ended tasks that require iterative feedback and improvement.
Future Developments
The paper suggests several promising avenues for future research:
- Real-World Applications: Extending Luban’s pragmatic verification to real-world tasks, potentially leading to AI agents that can perform creative building tasks in physical environments.
- Enhanced Libraries: Developing extensive libraries that can bridge VLMs with the physical world, fostering the emergence of agents with spatial intelligence.
Final Thoughts
Luban is a step forward in making AI more adept at handling creative, open-ended tasks. By incorporating human-like iterative verification and refinement processes, it shows promise in both virtual and real-world applications. As research continues, we can expect even more sophisticated and functional AI agents in various domains.
Create an account to read this summary for free: