Emergent Mind

Abstract

Large Language Model (LLM) based agents have garnered significant attention and are becoming increasingly popular. Furthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achieving a desired goal from an initial state. This paper investigates enhancing the planning abilities of LLMs through instruction tuning, referred to as agent training. Recent studies have demonstrated that utilizing expert-level trajectory for instruction-tuning LLMs effectively enhances their planning capabilities. However, existing work primarily focuses on synthesizing trajectories from manually designed planning tasks and environments. The labor-intensive nature of creating these environments and tasks impedes the generation of sufficiently varied and extensive trajectories. To address this limitation, this paper explores the automated synthesis of diverse environments and a gradual range of planning tasks, from easy to difficult. We introduce a framework, AgentGen, that leverages LLMs first to generate environments and subsequently generate planning tasks conditioned on these environments. Specifically, to improve environmental diversity, we propose using an inspiration corpus composed of various domain-specific text segments as the context for synthesizing environments. Moreover, to increase the difficulty diversity of generated planning tasks, we propose a bidirectional evolution method, Bi-Evol, that evolves planning tasks from easier and harder directions to synthesize a task set with a smoother difficulty curve. The evaluation results derived from AgentBoard show that AgentGen greatly improves LLMs' planning ability, e.g., the AgentGen instruction-tuned Llama-3 8B surpasses GPT-3.5 in overall performance. Moreover, in certain tasks, it even outperforms GPT-4.

Agent training process: task preparation, trajectory synthesis, instruction tuning; features automated task generation framework AgentGen.

Overview

  • The paper introduces PCArena, a framework that leverages LLMs to generate diverse environments and planning tasks, thereby enhancing the planning abilities of LLM-based agents.

  • PCArena uses a bidirectional evolution approach, Bi-Evol, to evolve planning tasks, ensuring a smoother difficulty curve and improving agent training.

  • Empirical evaluations showed significant performance improvements for PCArena-tuned LLMs on both domain-specific and out-of-domain tasks compared to existing models like GPT-3.5 and GPT-4.

Enhancing Planning Capabilities in LLM-based Agents Through Environment and Task Generation: Insights from PCArena

The paper "PCArena: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation" addresses the challenge of improving planning abilities in Large Language Model (LLM)-based agents through a novel framework for automated environment and task synthesis. This research is significant within the AI community due to the labor-intensive nature of creating diverse and complex environments and planning tasks, an issue that has traditionally impeded the generation of varied and extensive trajectory data necessary for effective agent training.

The core contribution of this paper is the introduction of PCArena, a framework that leverages LLMs to generate environments and planning tasks, significantly expanding the diversity and range of these tasks. The authors present a method utilizing an inspiration corpus to ensure environmental diversity and a bidirectional evolution approach, termed Bi-Evol, to evolve planning tasks from both simpler and more complex directions. This dual approach helps create a smoother difficulty curve, enhancing the learning process of LLMs.

Methodology

PCArena is structured around two main stages:

  1. Environment Generation:

    • Utilizing an inspiration corpus composed of diverse text segments as the context, LLMs first generate environment specifications. These text segments ensure a broad range of scenarios and domains.
    • Following the specification generation, corresponding environment code is produced, supporting various programming languages like Python and PDDL.
    • An environment library is iteratively expanded, incorporating newly generated high-quality environments to serve as in-context examples.
  2. Task Generation:

    • Conditioned on the generated environment, LLMs create multiple planning tasks.
    • To achieve a gradual difficulty progression, Bi-Evol is introduced. This method evolves planning tasks towards both simplification and increased complexity, ensuring a smooth difficulty curve.

Experimental Results

The empirical evaluation is extensive, encompassing both in-domain (PDDL-based) and out-of-domain tasks (implemented in Python). Key findings include:

Performance on In-Domain Tasks:

PCArena instruction-tuned LLMs demonstrated substantial improvements. For instance, the PCArena-tuned Llama3-8B surpassed GPT-3.5 in overall performance metrics, achieving a notable success rate improvement (11.67 vs. 5.0). Notably, PCArena matched or outperformed GPT-4 on specific tasks like Barman.

Performance on Out-of-Domain Tasks:

Similar improvements were observed in tasks implemented in other programming languages. For example, PCArena significantly boosted Llama3's success rates in Alfworld and BabyAI, with an improvement margin on Alfworld surpassing GPT-3.5 (29.1 vs. 17.2).

Implications and Future Work

This research underscores the potential of automated environment and task generation in advancing the capabilities of LLM-based agents. The framework’s ability to create diverse and progressively challenging planning tasks serves to enhance the versatility and robustness of these agents. Practically, PCArena can alleviate the resource-intensive manual environment design, facilitating scalable agent training.

Theoretically, the methods presented open avenues for further exploration in automated curriculum design for agents. The bidirectional evolution method, in particular, highlights the importance of a nuanced approach to task difficulty that accommodates both ends of the complexity spectrum. Future developments can build on this by exploring more sophisticated evolution strategies or integrating this framework with other forms of machine learning feedback loops.

In conclusion, PCArena represents a significant advancement in the automated synthesis of environments and planning tasks, addressing key limitations in agent training. Its ability to improve planning performance across various task domains highlights both its practical utility and its contribution to foundational AI research. Further refinement and application of these methods could continue to drive progress in the development of more capable and generalized LLM-based agents.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.