Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 76 tok/s Pro
Kimi K2 216 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Tree Search for Language Model Agents (2407.01476v3)

Published 1 Jul 2024 in cs.AI, cs.CL, and cs.LG

Abstract: Autonomous agents powered by LMs have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards addressing this, we propose an inference-time search algorithm for LM agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-of-the-art agents. It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks. On the challenging VisualWebArena benchmark, applying our search algorithm on top of a GPT-4o agent yields a 39.7% relative increase in success rate compared to the same baseline without search, setting a state-of-the-art success rate of 26.4%. On WebArena, search also yields a 28.0% relative improvement over a baseline agent, setting a competitive success rate of 19.2%. Our experiments highlight the effectiveness of search for web agents, and we demonstrate that performance scales with increased test-time compute. We conduct a thorough analysis of our results to highlight improvements from search, limitations, and promising directions for future work. Our code and models are publicly released at https://jykoh.com/search-agents.

Citations (20)

Summary

  • The paper presents a novel tree search algorithm that integrates with LM agents to enhance planning and decision-making in complex web environments.
  • It employs a model-based value function with best-first search, achieving a 39.7% improvement on VWA tasks and 28.0% on WA tasks.
  • The method mitigates common LM failure modes by using backtracking and trajectory pruning to effectively navigate vast action spaces.

Tree Search for LLM Agents

The paper "Tree Search for LLM Agents" explores the integration of tree search algorithms with LLM agents, specifically aiming to enhance their capability to perform multi-step planning and exploration in web environments. The authors propose a novel inference-time search algorithm that improves the performance of LLM agents on challenging web tasks, effectively leveraging environmental feedback to navigate vast action spaces.

Introduction

Existing LLM (LM) agents demonstrate proficiency in natural language understanding and generation but often struggle with tasks requiring complex reasoning, planning, and environmental feedback processing. In open-ended web environments, the action space can be vast, necessitating efficient exploration and trajectory pruning to achieve task objectives. The authors propose a best-first tree search algorithm operating within the real environment to improve success rates on web tasks like VisualWebArena (VWA) and WebArena (WA).

The algorithm introduces a considerable increase in success rates over baseline models, showing a 39.7% improvement on VWA tasks and a 28.0% improvement on WA tasks when integrated with GPT-4o agents. These enhancements highlight the algorithm's potential to scale with increased test-time compute responsiveness, offering substantial progress toward autonomous web task execution.

Methodology

Agent Backbone

Most state-of-the-art agents utilize LLMs, often multimodal, conditioned on current observations to predict sequential actions. Strategies like ReAct, RCI, and CoT prompting support improved agent performance by enabling diverse sampling of possible action paths. The proposed tree search is compatible with and enhances these agent models by enabling broader and deeper exploration during inference.

Value Function

The authors employ a model-based value function to guide the search, estimating the expected reward associated with states. This heuristic aids in decision-making by evaluating environmental observations and task instructions to determine state success likelihood. Using a multimodal LM, they perform self-consistency prompting to refine these value estimates, enhancing search efficiency and reliability.

Search Algorithm

Inspired by A* search, the algorithm employs a best-first strategy augmented by LLM-driven action sampling. It iteratively constructs a search tree, evaluating nodes via the value function while expanding promising branches up to defined depths and budgets. This search method benefits from increased compute allowances, effectively managing expansive possibilities without becoming computationally prohibitive. Figure 1

Figure 1: Our proposed search algorithm. At each iteration, we pick the next state sps_p to expand from frontier $\mathcal{F and compute a score$v$ for it using the value function.</p></p> <h2 class='paper-heading' id='experimental-results'>Experimental Results</h2> <p>Experiments conducted across 910 tasks on VWA and 812 on WA demonstrate significant improvements, setting new benchmarks in success rates with the tree search integration. The analysis reveals the search algorithm&#39;s capacity to leverage increased computing resources, manifesting notable improvements in task execution efficiency and accuracy under scaled parameters. <img src="https://emergentmind-storage-cdn-c7atfsgud9cecchk.z01.azurefd.net/paper-images/2407-01476/search_budget_0.4.png" alt="Figure 2" title="" class="markdown-image" loading="lazy"> <p class="figure-caption">Figure 2: Success rate on a subset of 200 VWA tasks with search budget $c.. c=0indicatesnosearchisperformed.Successrategenerallyincreasesas indicates no search is performed. Success rate generally increases as c$ increases.

Qualitative Analysis

The tree search methodology enhances robustness by allowing agents to explore beyond the first sampled action, effectively backtracking to prune non-viable paths. This approach addresses typical failure modes in LM agents—such as compounded errors and decision loops—by providing explicit rollbacks to alternative, potentially successful trajectories. Figure 3

Figure 3: Search can improve robustness by backtracking from bad actions. Shown above is a trajectory for VWA classifieds task #48 where greedily picking the first sampled actions would have led to a failure (the path in the first row).

Limitations and Future Work

While effective, the current search implementation can be computationally expensive, notably due to the backtracking mechanism requiring full environment resets and action sequence replays. Practical deployment would require refined strategies to reduce overhead and potentially incorporate non-destructive action criteria. Future work might explore more sophisticated search algorithms or value functions that further streamline these processes, enhancing scalability and responsiveness in real-world applications.

Conclusion

This paper introduces a meaningful advancement in LM agent capabilities through tree search integration, demonstrating improved success rates and efficiency in navigating complex web tasks. The algorithm provides a scalable framework for expanding web agent task capabilities, offering insights into broader applications requiring intricate planning and execution—making it a promising direction for future artificial intelligence research and development.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 7 tweets and received 131 likes.

Upgrade to Pro to view all of the tweets about this paper:

Reddit Logo Streamline Icon: https://streamlinehq.com

Reddit