Emergent Mind

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

(2312.14878)
Published Dec 22, 2023 in cs.AI and cs.LG

Abstract

A key method for creating AI agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. LLMs emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

Overview

  • The Pangu-Agent framework is designed to integrate structured reasoning into AI agent policies, allowing fine-tuning for the acquisition of new skills.

  • Structured reasoning is introduced to traditional reinforcement learning, reformulating policies to consider multiple cognitive steps, mirroring human cognition.

  • Intrinsic functions within the agent handle internal memory transformations, while extrinsic functions manage actions in response to the external environment.

  • Evaluations show that agents utilizing structured reasoning outperform those that don't, with significant advancements when fine-tuned through Supervised and Reinforcement Learning methods.

  • Future developments aim to enhance full differentiability, apply the framework to real-world tasks, and improve memory and tool usage capabilities.

Introduction to Pangu-Agent Framework

The Pangu-Agent framework introduces a nuanced approach to integrating structured reasoning into AI agents' policies while allowing fine-tuning for new skills. This framework, inspired by the human brain's modular cognitive processes, intertwines intrinsic and extrinsic functions to simulate reasoning and leverage prior knowledge and learning adaptability.

Structured Reasoning and Policy Formulation

At the crux of Pangu-Agent is the concept of structured reasoning. Traditional reinforcement learning (RL) objectives are transformed by introducing intrinsic functions that reformulate policies to include multiple 'thinking' steps. These functions, acting on the agent's internal state or memory, enable a nested set of cognition-inspired operations. Such structures were previously absent from standard RL formulations but are critical in scaling agents across diverse tasks. Agents learn from both their experiences and their interactions with the environment, thus creating a memory that evolves and informs their decision-making.

Intrinsic and Extrinsic Functions

Intrinsic functions define the internal thought process of an agent, handling memory transformation based on observations and previous knowledge. They encapsulate complex operations like reflection, planning, and tool usage. Extrinsic functions, in contrast, are responsible for the agent's interactions with its external environment. They dictate the actions taken based on observations and modified memory states.

Evaluation and Fine-Tuning

The paper presents a detailed evaluation that showcases how structured reasoning enhances AI agents' success in task-solving. By comparing first-order and composite methods on different tasks, the results suggest that fine-tuned agents, backed by structured reasoning, significantly outperform their counterparts. Pangu-Agent demonstrates its supreme adaptability and performance through Supervised Fine-Tuning (SFT) and Reinforcement Learning Fine-Tuning (RLFT), showing dramatic improvements across various domains.

Future Directions

The paper concludes by highlighting potential areas for future development such as full differentiability of the framework, real-world applications, advanced memory retrieval, and tool usage enhancements. These improvements aim to refine the Pangu-Agent framework even further, setting the stage for the development of truly generalist AI agents.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.