ReAct: Synergizing Reasoning and Acting in Language Models (2210.03629v3)
Abstract: While LLMs have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper introduces ReAct, a simple way to help big LLMs “think” and “do” at the same time. Instead of only writing out their reasoning (like showing steps for a math problem) or only taking actions (like clicking buttons on a website), ReAct lets the model switch between thinking in words and acting in the world. This makes the model more accurate, less likely to make things up, and easier for people to understand and trust.
Key Objectives
Here are the main questions the researchers wanted to answer:
- Can a LLM solve problems better if it combines reasoning (thinking step by step) with acting (using tools like Wikipedia or navigating websites)?
- Does this combination reduce mistakes like “hallucinations” (making up facts)?
- Is the model’s process easier for humans to follow and check?
- Can this work well with very few examples, compared to training-heavy methods like imitation or reinforcement learning?
Methods and Approach
Think of ReAct as teaching a robot-chef how to cook by both thinking out loud and doing things in the kitchen. It might say, “I need to boil water,” and then actually turn on the stove, check the pot, and adjust the plan if it’s missing salt. ReAct brings this “think–act–observe” loop to LLMs.
- LLMs: These are powerful AI systems that read and write text. The paper mostly uses PaLM, a very LLM from Google.
- Prompting: Instead of retraining the model, they show it a few example “trajectories” of solving tasks. Each example has “Thought” and “Act” steps, plus what the environment shows back (“Observation”).
- Reasoning (“Thought”): The model writes down its plan, remembers what it has learned, and updates its strategy. This can include breaking down a big goal, using commonsense, and deciding what to do next.
- Acting (“Act”): The model takes a task-specific action. For example:
- In question answering, it can search Wikipedia or look up parts of a page.
- In games, it can move to a location, open a drawer, or pick up an object.
- On shopping sites, it can search for products, click options, and decide to buy.
- Observations: After acting, the environment replies (like showing a web page, a room’s contents, or product options). The model reads this and uses it in its next thought.
- Tasks tested:
- HotpotQA: Multi-step question answering using Wikipedia.
- FEVER: Fact checking claims using Wikipedia.
- ALFWorld: A text-based household game where an agent completes tasks (like “put the pepper shaker in the drawer”).
- WebShop: An online shopping environment with real product listings; the agent tries to buy exactly what the user asked for.
- Baselines (comparisons):
- Reason-only (Chain-of-Thought or CoT): Thinks in steps but doesn’t use tools.
- Act-only: Takes actions but doesn’t think out loud.
- ReAct: Interleaves both.
- They also try combining ReAct with “self-consistency” CoT (asking the model multiple times and choosing the most common answer) to blend internal knowledge with external information.
Main Findings and Why They Matter
- ReAct reduces hallucinations: When models only “think” using their internal memory, they can invent facts. ReAct helps the model check facts by acting (like searching Wikipedia), making the process more grounded and trustworthy.
- Better performance on interactive tasks:
- ALFWorld: ReAct beats both act-only and learning-based baselines by a large margin. In some cases, it improves success rates by an absolute 34%.
- WebShop: ReAct outperforms imitation and reinforcement learning methods, raising success rate by about 10% with just one example in the prompt.
- Competitive on knowledge tasks and best when combined:
- HotpotQA and FEVER: ReAct is competitive with pure reasoning methods and clearly better at staying factual. The best results come from combining ReAct with self-consistent CoT—using both internal reasoning and external info smartly.
- More interpretable and controllable: Because ReAct writes down its thoughts and actions, humans can see why decisions were made, spot errors, and even correct the model mid-process.
- Finetuning boosts ReAct further: When they trained smaller models on thousands of ReAct-style examples, ReAct scaled well—doing better than just training on reasoning-only or action-only data.
These results matter because they show that letting AI both think and interact can make it smarter, safer, and more useful in real situations—like answering tricky questions or navigating websites to complete tasks.
Implications and Impact
- Smarter assistants: Future AI helpers could solve complex tasks by planning, checking facts, and adapting as they go—like booking trips, troubleshooting devices, or researching topics.
- Safer, more trustworthy AI: ReAct’s grounded approach helps avoid made-up facts and makes decisions clearer to users.
- Efficient learning: With just a few examples, ReAct can match or beat methods that need huge training datasets, making it practical for many applications.
- Path to stronger agents: Combining ReAct with reinforcement learning and training on more tasks could create more capable general-purpose agents.
- Responsible design: The authors note that connecting models to external actions must be done carefully to avoid harmful behavior. In their experiments, actions were limited to safe environments like Wikipedia and a research shopping site.
In short, ReAct shows that blending “thinking” and “doing” helps AI solve problems more accurately and transparently—moving closer to how people naturally approach tasks.
Collections
Sign up for free to add this paper to one or more collections.