SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks (2305.17390v2)

Published 27 May 2023 in cs.CL, cs.AI, cs.LG, cs.MA, and cs.RO

Abstract: We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting LLMs to enhance task completion performance. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex interactive tasks.

References (41)

Citations (105)

View on Semantic Scholar

Summary

The paper presents a dual-process AI agent that combines fast intuition with careful planning to outperform traditional methods.
It employs a Swift module for rapid action via behavior cloning and a Sage module for strategic planning and execution.
Evaluations on ScienceWorld tasks demonstrate enhanced efficiency and reduced token costs compared to competing approaches.

Enhancing Complex Interactive Task Performance with SwiftSage: Integrating Fast and Slow Thinking in Generative Agents

Introduction to SwiftSage

Recent advancements in artificial intelligence have seen significant efforts in developing agents capable of solving complex interactive reasoning tasks reminiscent of human problem-solving skills. SwiftSage, a novel generative agent model, lays a new foundation in this domain by drawing inspiration from the dual-process theory of human cognition. This model synthesizes the fast, intuitive action capabilities of behavior cloning with the deliberate, methodical reasoning prowess of LLMs, like GPT-4, to create a robust framework for tackling intricate tasks across dynamic environments.

Core Components of SwiftSage

The SwiftSage framework partitions its problem-solving mechanism into two distinct modules: the Swift and the Sage modules, respectively correlating with the intuitive (System 1) and analytical (System 2) processes as theorized in human cognition.

Swift Module: This component acts on the principle of fast thinking. It utilizes a streamlined encoder-decoder LLM architecture to encode immediate environmental contexts and decode potential subsequent actions. The Swift module is specialized through behavior cloning on the action trajectories of oracle agents, enabling swift, competent responses within familiar task environments.
Sage Module: In contrast, the Sage module embodies the slow, deliberate thinking pathway. It involves two stages of operation: planning and grounding. Initially, it employs LLMs to devise high-level plans and strategies for task completion. Following this, it transitions to grounding these plans into a sequence of actionable steps. This dual-phase approach allows for a meticulous unpacking and execution of complex tasks, especially where novel or unplanned scenarios arise.

Integration and Evaluation

A heuristic-based integration strategy determines the operational dynamics between the Swift and Sage modules, ensuring a seamless transition between fast and slow thinking modes in response to task demands. This mechanism significantly enhances task completion efficiency while maintaining a high performance threshold.

Evaluation on the ScienceWorld benchmark, encompassing 30 diverse task types, attests to SwiftSage's superior competency. It not only outperforms existing methods like SayCan, ReAct, and Reflexion in task completion scores but also showcases remarkable efficiency and cost-effectiveness in inference token utilization.

Implications and Future Directions

SwiftSage's model articulates a compelling narrative in the advancement of AI agents for complex interactive reasoning tasks. It foregrounds a symbiotic integration of behavior cloning and LLM prompting within a dual-process theoretical framework, significantly pushing the envelope on task performance, efficiency, and adaptability.

Practical Implications

The practical utility of SwiftSage spans diverse applications where complex, interactive reasoning is paramount. It opens up new vistas in automated problem-solving within dynamic environments, suggesting robust frameworks for future AI-driven task automation, simulation-based training, and interactive educational tools.

Theoretical Implications

Theoretically, SwiftSage provides a viable pathway for exploring the integration of different cognitive processes within AI models. It amplifies the conversation around the potential of dual-process theories in designing intelligent agents, promoting a deeper understanding of human-like reasoning mechanisms.

Future Research

Looking ahead, scaling SwiftSage's methodology to encompass open-ended environments represents a promising avenue for research. This includes amplifying its adaptability to broader action and object spectrums, simulating more real-world scenarios, and exploring energy-efficient LLM strategies to facilitate broader utility and applicability.

Conclusion

SwiftSage represents a significant stride towards emulating human-like problem-solving abilities in artificial agents for complex interactive tasks. By harmonizing the strengths of fast and slow cognitive processes, it sets a new benchmark in task performance and efficiency, heralding a new era in the development of intelligent, versatile AI agents. As we venture into future developments, SwiftSage's framework promises to be a cornerstone in the evolution of AI-driven complex task resolution methodologies.