Emergent Mind


Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision has showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with its fundamental characteristics and challenges motivated by the success of LLMs. Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.

Roadmap of foundation agents: scaling with data collection, self-supervised pretraining, Transformer architecture.


  • The paper introduces 'Foundation Agents,' highlighting their potential to revolutionize learning paradigms in decision-making akin to the impact of large foundation models in language and vision tasks.

  • It outlines the attributes of foundation agents, including a unified representation of variables, a consistent policy framework, and interactive decision-making capabilities required to handle diverse tasks.

  • The development roadmap includes large-scale data collection, self-supervised pretraining, and alignment with LLMs, while addressing challenges like unified modeling, optimization, and handling open-ended tasks.

Overview of "Foundation Agents as the Paradigm Shift for Decision Making"

The paper "Position: Foundation Agents as the Paradigm Shift for Decision Making" introduces the concept of foundation agents, emphasizing their potential to revolutionize agent learning paradigms similar to the impact of large foundation models in language and vision tasks. The discussion revolves around designing foundation agents to improve sample efficiency and generalization capabilities in complex decision-making scenarios.

Key Elements of Foundation Agents

Foundation agents are conceptualized as generally capable agents adept at handling diverse decision-making tasks across physical and virtual environments. The core attributes of foundation agents include:

  1. Unified Representation: A universal framework to represent variables within the decision process, which includes state-action spaces, feedback signals like rewards or goals, and environmental dynamics.
  2. Unified Policy Interface: A consistent policy framework applicable across disparate tasks and domains, including robotics, gameplay, and healthcare.
  3. Interactive Decision-Making: The ability to reason about behaviors, address environment stochasticity and uncertainty, and navigate multi-agent competitive or cooperative scenarios.

Roadmap to Foundation Agents

The paper delineates a strategic roadmap for the development of foundation agents, which involves several stages:

  1. Large-Scale Data Collection: Interactive data can be accumulated from various sources like the internet (e.g., YouTube videos, tutorials) and real-world interactions.
  2. Self-Supervised Pretraining: Utilizing unsupervised learning techniques to pretrain models on large volumes of unannotated data.
  3. Alignment with LLMs: Integrating knowledge and values encapsulated within LLMs to enhance foundation agents' reasoning and generalization capabilities.

Self-Supervised Pretraining and Adaptation

Self-supervised learning is pivotal for the foundational aspect of these agents. The pretraining involves two notable steps:

  1. Embedding Trajectories: This includes tokenizing trajectory sequences and utilizing various Transformer architectures for sequence modeling.
  2. Learning Objectives: The learning objectives encompass autoregressive or masked modeling techniques adapted from language and vision domains. Table \ref{tab:objective} in the paper provides a comprehensive summary of these objectives.

Challenges in Foundation Agents Development

Despite the promising capabilities, several challenges must be addressed:

  1. Unified or Compositional Models: There is an ongoing debate on whether a singular, unified model should be pursued or whether a compositional approach integrating existing foundation models can be more feasible.
  2. Optimization and Theoretical Foundations: The optimization of these agents through rigorous theoretical frameworks needs extensive research, particularly to ensure efficacy and robustness.
  3. Handling Open-Ended Tasks: Foundation agents must incorporate various strategies to handle open-ended tasks characterized by evolving objectives and environments.

Use Cases and Implications

The potential impact of foundation agents spans multiple domains:

  1. Autonomous Control: Including robotics and self-driving vehicles, where foundation agents can improve adaptability and robustness.
  2. Healthcare: Enhancing diagnostic accuracy and treatment personalization by leveraging vast medical data efficiently.
  3. Scientific Research: Accelerating discovery and experimentation processes, thereby expediting scientific advancements.


The paper posits that the integration of extensive interactive data, self-supervised learning, and the alignment with LLMs could significantly advance the development of foundation agents. Given the complexity and diversity of decision-making tasks, future research must navigate the challenges of unified modeling and optimization strategies, ensuring foundation agents' reliability and effectiveness in real-world applications. The evolution toward foundation agents potentially marks a significant shift in artificial intelligence, bringing us closer to achieving robust and versatile autonomous systems.


Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.