Papers
Topics
Authors
Recent
2000 character limit reached

ReAct: Synergizing Reasoning and Acting in Language Models (2210.03629v3)

Published 6 Oct 2022 in cs.CL, cs.AI, and cs.LG

Abstract: While LLMs have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io

Citations (1,812)

Summary

  • The paper introduces the ReAct framework that interleaves reasoning and action steps to enhance LLM performance on complex, knowledge-intensive tasks.
  • It employs a thought-action-observation cycle that enables dynamic planning, error correction, and real-time grounding of external information.
  • Experimental results show significant improvements over chain-of-thought and acting-only baselines, offering more interpretable and effective task trajectories.

The ReAct Framework: Interleaving Reasoning and Action in LLMs

The ReAct framework proposes a method for enabling LLMs to solve complex tasks by synergistically combining reasoning and acting. Instead of treating reasoning (e.g., chain-of-thought) and acting (e.g., action plan generation) as separate capabilities, ReAct structures the LLM's operation as an interleaved sequence of thought, action, and observation steps. This approach allows the model to dynamically reason about the task, formulate actions to interact with external environments or knowledge sources, and incorporate observations from these interactions to refine its reasoning and subsequent actions. The core idea is that reasoning benefits from grounding in external information obtained via actions, while actions become more targeted and effective when guided by explicit reasoning steps.

Methodology: Thought, Action, Observation Cycle

The ReAct approach operationalizes this synergy through a specific prompting strategy. The LLM is prompted with few-shot examples demonstrating the desired interleaved pattern of thought, action, and observation. For a given task instance, the LLM iteratively generates:

  1. Thought (tit_i): A natural language reasoning trace outlining the current understanding of the task, the strategy for the next step, or analysis of previous outcomes. This internal monologue helps the model decompose the problem, track progress, update plans, and handle unexpected situations.
  2. Action (aia_i): A specific action intended to interact with an external source, formatted according to a predefined action space relevant to the task. Actions might include searching a knowledge base, querying an API, or interacting with a simulated environment.
  3. Observation (oio_i): The feedback received from the external source after executing action aia_i. This could be a snippet of text from a Wikipedia page, a result from a calculation, or the state change description from an environment simulator.

This cycle (ti,ai,oit_i, a_i, o_i) repeats, with the context for generating the next thought (ti+1t_{i+1}) comprising the initial prompt, the task input, and the history of preceding thought-action-observation triplets. The process terminates when an action indicates the final answer is reached or a stopping criterion is met.

The action space is task-dependent. For knowledge-intensive tasks like question answering (HotpotQA) and fact verification (Fever), the action space typically includes:

  • search[entity]: Queries an external knowledge source (e.g., Wikipedia API) for information about a specific entity.
  • lookup[string]: Looks for a specific string within a retrieved document, useful for finding keywords or sentences related to the reasoning process.
  • finish[answer]: Concludes the process and outputs the final answer.

For interactive decision-making tasks like ALFWorld (text-based game simulation) and WebShop (simulated online shopping), the action space corresponds to the admissible commands within the respective environments (e.g., go to, open, click, search).

The prompting relies on few-shot learning, where 1 to 6 examples of successful ReAct trajectories for the specific task are included in the prompt given to the LLM (e.g., PaLM-540B). These examples guide the model to produce the desired interleaved structure and task-specific reasoning patterns.

Implementation and Deployment Considerations

Implementing ReAct involves setting up an orchestration loop that interacts with the LLM and the external tools.

  1. LLM Interface: Requires API access to a sufficiently capable LLM that can follow the structured prompting format and generate coherent thoughts and valid actions based on the provided context history.
  2. Tool Integration: Interfaces need to be built for each action type. For Wikipedia-based tasks, this involves a simple API wrapper to search and retrieve page snippets. For ALFWorld and WebShop, it requires interfacing with their respective simulation engines to execute actions and receive state observations.
  3. Prompt Engineering: Crafting effective few-shot prompts is crucial. The examples must clearly demonstrate the desired reasoning process, the correct action formatting, and how observations influence subsequent thoughts and actions.
  4. Parsing and State Management: The control loop must parse the LLM's output to distinguish thoughts from actions, validate actions against the allowed action space, execute valid actions using the appropriate tool, and format the resulting observation before appending the triplet (ti,ai,oi)(t_i, a_i, o_i) to the context for the next LLM call.
  5. Error Handling: The system needs to handle potential LLM errors (e.g., generating invalid actions, hallucinating within thoughts) and tool errors (e.g., API failures, environment exceptions). The reasoning capability of ReAct can be leveraged here, allowing the model to potentially recognize and recover from errors based on observations.

Computational requirements depend on the chosen LLM and the complexity/length of the tasks. Each step involves an LLM inference call, and interactions with external tools add latency. The length of the context grows with each turn, potentially hitting context window limits for very long tasks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def react_solve(task_description, prompt_examples, LLM, tools):
    """
    Executes the ReAct loop for a given task.

    Args:
        task_description (str): The input query or task definition.
        prompt_examples (str): Few-shot examples demonstrating ReAct trajectories.
        LLM (LLM_Interface): Interface to the LLM.
        tools (dict): Dictionary mapping action types to tool execution functions.
                      e.g., {'search': wikipedia_search, 'lookup': lookup_string, ...}

    Returns:
        str: The final answer or result.
    """
    context = prompt_examples + "\n\nTask: " + task_description + "\n"
    max_steps = 10 # Example limit
    
    for i in range(max_steps):
        # 1. Generate Thought and Action
        response = LLM.generate(context) 
        
        # Simple parsing (actual implementation needs robustness)
        thought = parse_thought(response) 
        action_str = parse_action(response)
        action_type, action_arg = parse_action_details(action_str)

        context += f"Thought {i+1}: {thought}\n"
        context += f"Action {i+1}: {action_str}\n"
        
        print(f"Step {i+1}:")
        print(f"  Thought: {thought}")
        print(f"  Action: {action_str}")

        # 2. Execute Action and Get Observation
        if action_type == "finish":
            final_answer = action_arg
            print(f"  Observation: Reached Final Answer.")
            return final_answer
        
        if action_type in tools:
            try:
                observation = tools[action_type](action_arg)
            except Exception as e:
                observation = f"Error executing action: {e}"
        else:
            observation = f"Error: Unknown action type '{action_type}'."

        context += f"Observation {i+1}: {observation}\n"
        print(f"  Observation: {observation}\n")

    return "Max steps reached without finishing."

def parse_thought(response): 
    # Extract thought part from LLM response
    pass 
def parse_action(response):
    # Extract action part from LLM response
    pass
def parse_action_details(action_str):
     # Parse action string into type and argument (e.g., "search[Python]" -> ("search", "Python"))
     pass

Experimental Results and Analysis

ReAct was evaluated against several baselines across different task types using PaLM-540B.

  • Knowledge-Intensive Tasks (HotpotQA, Fever):
    • Baselines included standard few-shot prompting (Standard), Chain-of-Thought prompting (CoT), and an Acting-only variant (Act).
    • On HotpotQA, ReAct achieved a score of 71.2, significantly outperforming CoT (56.8) and Act (51.1). It demonstrated a better ability to retrieve supporting facts through search actions and decompose the question via reasoning, mitigating hallucination issues observed in CoT.
    • On Fever (fact verification), ReAct achieved an accuracy of 87.3, compared to 83.0 for CoT and 85.5 for Act. ReAct trajectories showed explicit steps of searching for evidence related to the claim and then reasoning about its veracity based on the retrieved information. Qualitative analysis highlighted ReAct's ability to recover from initial incorrect searches by reasoning about the lack of relevant information in the observation and formulating a new search query.
  • Interactive Decision-Making Tasks (ALFWorld, WebShop):
    • Baselines included Act (acting-only LLM), and domain-specific methods like imitation learning (IL) using Behavior Cloning (BC) and reinforcement learning (RL) for ALFWorld (BUTLER).
    • On ALFWorld (commonsense reasoning in simulated household environments), ReAct achieved a success rate of 71%, a substantial improvement over Act (37%) and the prior state-of-the-art IL/RL methods (BUTLER: 37%). ReAct needed only 2 in-context examples compared to the large expert datasets required by IL/RL.
    • On WebShop (goal-oriented web navigation and shopping), ReAct achieved a success rate of 32.0 (averaged over 500 items), compared to 22.0 for Act and 18.8 for a specialized IL method (using HTML inputs). ReAct demonstrated more effective planning and adaptation within the complex state space of the simulated web environment.

Across tasks, ReAct consistently outperformed both reasoning-only (CoT) and acting-only (Act) baselines, supporting the central hypothesis that synergizing the two leads to improved performance. The generated trajectories were also found to be more interpretable, as the thought steps provided explicit insights into the model's reasoning process, making it easier to diagnose failures and understand successes. ReAct effectively uses actions to ground reasoning and mitigate hallucination by fetching external information, while using thoughts to maintain and adapt high-level plans during potentially long action sequences.

Synergy Dynamics

The effectiveness of ReAct stems from the bidirectional benefits between reasoning and acting:

  • Reasoning Enhances Acting:
    • High-level planning: Thoughts allow the model to decompose complex goals into sequences of simpler actions.
    • Strategic exploration: Reasoning helps decide which actions are most promising for information gain or goal progression.
    • Error detection/Correction: Thoughts can identify when an action failed or yielded unexpected results (based on observation), prompting corrective actions or plan adjustments (e.g., "The search for X didn't work, let me try searching for Y instead").
    • Maintaining context: For long trajectories, thoughts help track the overall goal and progress made so far.
  • Acting Enhances Reasoning:
    • Grounding: Actions fetch real-time, external information, preventing the model from relying solely on its potentially outdated or incorrect internal knowledge (mitigating hallucination).
    • Information gathering: Actions provide specific, targeted information needed to answer questions or verify facts, which may not be present in the initial context.
    • Exploring consequences: In interactive environments, actions reveal the results of decisions, allowing the model to reason about cause and effect within the environment dynamics.

Limitations

The paper acknowledges several limitations:

  • Increased Steps/Tokens: ReAct trajectories are often longer and involve more LLM calls and token processing compared to CoT or standard prompting due to the interleaved structure and interactions.
  • Prompt Sensitivity: Performance relies heavily on the quality and relevance of the few-shot examples provided in the prompt.
  • Action Space Design: Defining an appropriate and effective action space is crucial and task-dependent.
  • Potential for Hallucination in Thoughts: While acting mitigates hallucination regarding external facts, the reasoning steps (thoughts) themselves can still contain logical fallacies or internal hallucinations, potentially leading actions astray.

Conclusion

ReAct presents a compelling framework for enhancing LLM capabilities by explicitly interleaving reasoning traces and actions that interact with external sources. Its demonstrated performance improvements on diverse knowledge-intensive and decision-making tasks highlight the benefits of this synergy. By generating interpretable thought-action-observation trajectories, ReAct allows LLMs to dynamically plan, gather information, and adapt to task requirements, overcoming limitations associated with purely reasoning-based or action-based approaches and offering a promising direction for building more capable and trustworthy autonomous agents.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Explain it Like I'm 14

Overview

This paper introduces ReAct, a simple way to help big LLMs “think” and “do” at the same time. Instead of only writing out their reasoning (like showing steps for a math problem) or only taking actions (like clicking buttons on a website), ReAct lets the model switch between thinking in words and acting in the world. This makes the model more accurate, less likely to make things up, and easier for people to understand and trust.

Key Objectives

Here are the main questions the researchers wanted to answer:

  • Can a LLM solve problems better if it combines reasoning (thinking step by step) with acting (using tools like Wikipedia or navigating websites)?
  • Does this combination reduce mistakes like “hallucinations” (making up facts)?
  • Is the model’s process easier for humans to follow and check?
  • Can this work well with very few examples, compared to training-heavy methods like imitation or reinforcement learning?

Methods and Approach

Think of ReAct as teaching a robot-chef how to cook by both thinking out loud and doing things in the kitchen. It might say, “I need to boil water,” and then actually turn on the stove, check the pot, and adjust the plan if it’s missing salt. ReAct brings this “think–act–observe” loop to LLMs.

  • LLMs: These are powerful AI systems that read and write text. The paper mostly uses PaLM, a very LLM from Google.
  • Prompting: Instead of retraining the model, they show it a few example “trajectories” of solving tasks. Each example has “Thought” and “Act” steps, plus what the environment shows back (“Observation”).
  • Reasoning (“Thought”): The model writes down its plan, remembers what it has learned, and updates its strategy. This can include breaking down a big goal, using commonsense, and deciding what to do next.
  • Acting (“Act”): The model takes a task-specific action. For example:
    • In question answering, it can search Wikipedia or look up parts of a page.
    • In games, it can move to a location, open a drawer, or pick up an object.
    • On shopping sites, it can search for products, click options, and decide to buy.
  • Observations: After acting, the environment replies (like showing a web page, a room’s contents, or product options). The model reads this and uses it in its next thought.
  • Tasks tested:
    • HotpotQA: Multi-step question answering using Wikipedia.
    • FEVER: Fact checking claims using Wikipedia.
    • ALFWorld: A text-based household game where an agent completes tasks (like “put the pepper shaker in the drawer”).
    • WebShop: An online shopping environment with real product listings; the agent tries to buy exactly what the user asked for.
  • Baselines (comparisons):
    • Reason-only (Chain-of-Thought or CoT): Thinks in steps but doesn’t use tools.
    • Act-only: Takes actions but doesn’t think out loud.
    • ReAct: Interleaves both.
    • They also try combining ReAct with “self-consistency” CoT (asking the model multiple times and choosing the most common answer) to blend internal knowledge with external information.

Main Findings and Why They Matter

  • ReAct reduces hallucinations: When models only “think” using their internal memory, they can invent facts. ReAct helps the model check facts by acting (like searching Wikipedia), making the process more grounded and trustworthy.
  • Better performance on interactive tasks:
    • ALFWorld: ReAct beats both act-only and learning-based baselines by a large margin. In some cases, it improves success rates by an absolute 34%.
    • WebShop: ReAct outperforms imitation and reinforcement learning methods, raising success rate by about 10% with just one example in the prompt.
  • Competitive on knowledge tasks and best when combined:
    • HotpotQA and FEVER: ReAct is competitive with pure reasoning methods and clearly better at staying factual. The best results come from combining ReAct with self-consistent CoT—using both internal reasoning and external info smartly.
  • More interpretable and controllable: Because ReAct writes down its thoughts and actions, humans can see why decisions were made, spot errors, and even correct the model mid-process.
  • Finetuning boosts ReAct further: When they trained smaller models on thousands of ReAct-style examples, ReAct scaled well—doing better than just training on reasoning-only or action-only data.

These results matter because they show that letting AI both think and interact can make it smarter, safer, and more useful in real situations—like answering tricky questions or navigating websites to complete tasks.

Implications and Impact

  • Smarter assistants: Future AI helpers could solve complex tasks by planning, checking facts, and adapting as they go—like booking trips, troubleshooting devices, or researching topics.
  • Safer, more trustworthy AI: ReAct’s grounded approach helps avoid made-up facts and makes decisions clearer to users.
  • Efficient learning: With just a few examples, ReAct can match or beat methods that need huge training datasets, making it practical for many applications.
  • Path to stronger agents: Combining ReAct with reinforcement learning and training on more tasks could create more capable general-purpose agents.
  • Responsible design: The authors note that connecting models to external actions must be done carefully to avoid harmful behavior. In their experiments, actions were limited to safe environments like Wikipedia and a research shopping site.

In short, ReAct shows that blending “thinking” and “doing” helps AI solve problems more accurately and transparently—moving closer to how people naturally approach tasks.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We found no open problems mentioned in this paper.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 46 tweets with 497 likes about this paper.

Youtube Logo Streamline Icon: https://streamlinehq.com