LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination (2312.15224v2)

Published 23 Dec 2023 in cs.AI and cs.HC

Abstract: AI agents powered by LLMs have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.

Citations (24)

View on Semantic Scholar

Summary

The paper presents a Hierarchical Language Agent that combines Slow Mind reasoning with Fast Mind execution to reduce latency in real-time interactions.
It employs a reactive Executor policy to translate macro actions into precise commands, as validated using the Overcooked gaming testbed.
Empirical studies show that HLA outperforms baseline agents in responsiveness and game scores, highlighting its effective human-AI coordination.

Understanding Human-AI Coordination Through the Lens of LLMs

The Challenge of Interaction Latency

AI agents powered by LLMs have showcased impressive capabilities. Their adoption is widespread, with implications across various sectors, including content creation, robotics, and more. Despite their promising advantages, a major challenge surfaces when considering the application in real-time scenarios, such as interactive gaming: inference latency. LLM-driven agents typically lean on LLM APIs coupled with complex prompts, causing latency periods ranging from several seconds to minutes. This latency critically undermines their effectiveness in domains calling for immediate interaction.

A Novel Hierarchical Language Agent

In response to this latency hurdle, researchers have introduced a Hierarchical Language Agent (HLA) adept in not only reasoning but also executing tasks in real time. HLA employs a hierarchical structure, combining a proficient LLM, referenced as Slow Mind, for intentional reasoning and language-based communication, with a lightweight LLM, termed Fast Mind, to initiate macro actions. Additionally, a reactive policy known as Executor translates these macro actions into executable atomic actions. This structure allows for an efficient parsing of human instructions into actionable commands, significantly enhancing human-AI collaboration in time-sensitive tasks.

Overcooked as a Real-Time Testbed

The validity of this approach has been gauged using the cooperative cooking game Overcooked as a testbed. Within this environment, the LLMs exhibit human-like cooperation through natural language communication, encountering frequent human commands like "Chop 3 tomatoes". The AI player promptly interprets and executes such instructions, showcasing the agent's responsiveness and comprehension of vague language cues. These operations are realized within time constraints, demonstrating effective real-time human-AI interaction.

HLA Outperforms Baselines in Human Studies

An empirical evaluation comparing HLA against other baseline agents—each lacking in specific HLA components—showed HLA's superior operational abilities. The human studies further quantified its performance, revealing that HLA outperformed baselines with a remarkable lead in game scores and demonstrated faster action responses. Human participants favored HLA significantly over the slow-mind-only and fast-mind-only agents, underscoring its enhanced cooperative skills, quick responsiveness, and consistent language communication.

In summation, the Hierarchical Language Agent puts forward a robust framework for real-time human-AI coordination tasks. The paper underscores the importance of hierarchical reasoning and planning within AI systems for applications demanding high-frequency interactions and swift responses. This paves a promising avenue for more dynamic and responsive AI-driven collaborations in various real-time applications.

PDF Markdown