Emergent Mind

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

(2312.15224)
Published Dec 23, 2023 in cs.AI and cs.HC

Abstract

AI agents powered by LLMs have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.

Overview

  • LLMs have widespread uses but struggle with inference latency in real-time scenarios.

  • Researchers developed a Hierarchical Language Agent that combines a 'Slow Mind' and a 'Fast Mind' to reduce latency.

  • The AI's performance was tested in the real-time game Overcooked, demonstrating effective human-AI cooperation.

  • HLA outperformed other baselines in human studies, with better game scores and quicker responsiveness.

  • The study highlights the need for hierarchical reasoning in AI for applications requiring immediate interaction.

Understanding Human-AI Coordination Through the Lens of LLMs

The Challenge of Interaction Latency

AI agents powered by LLMs have showcased impressive capabilities. Their adoption is widespread, with implications across various sectors, including content creation, robotics, and more. Despite their promising advantages, a major challenge surfaces when considering the application in real-time scenarios, such as interactive gaming: inference latency. LLM-driven agents typically lean on LLM APIs coupled with complex prompts, causing latency periods ranging from several seconds to minutes. This latency critically undermines their effectiveness in domains calling for immediate interaction.

A Novel Hierarchical Language Agent

In response to this latency hurdle, researchers have introduced a Hierarchical Language Agent (HLA) adept in not only reasoning but also executing tasks in real time. HLA employs a hierarchical structure, combining a proficient LLM, referenced as Slow Mind, for intentional reasoning and language-based communication, with a lightweight LLM, termed Fast Mind, to initiate macro actions. Additionally, a reactive policy known as Executor translates these macro actions into executable atomic actions. This structure allows for an efficient parsing of human instructions into actionable commands, significantly enhancing human-AI collaboration in time-sensitive tasks.

Overcooked as a Real-Time Testbed

The validity of this approach has been gauged using the cooperative cooking game Overcooked as a testbed. Within this environment, the LLMs exhibit human-like cooperation through natural language communication, encountering frequent human commands like "Chop 3 tomatoes". The AI player promptly interprets and executes such instructions, showcasing the agent's responsiveness and comprehension of vague language cues. These operations are realized within time constraints, demonstrating effective real-time human-AI interaction.

HLA Outperforms Baselines in Human Studies

An empirical evaluation comparing HLA against other baseline agents—each lacking in specific HLA components—showed HLA's superior operational abilities. The human studies further quantified its performance, revealing that HLA outperformed baselines with a remarkable lead in game scores and demonstrated faster action responses. Human participants favored HLA significantly over the slow-mind-only and fast-mind-only agents, underscoring its enhanced cooperative skills, quick responsiveness, and consistent language communication.

In summation, the Hierarchical Language Agent puts forward a robust framework for real-time human-AI coordination tasks. The study underscores the importance of hierarchical reasoning and planning within AI systems for applications demanding high-frequency interactions and swift responses. This paves a promising avenue for more dynamic and responsive AI-driven collaborations in various real-time applications.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.