Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 58 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2305.10601v2)

Published 17 May 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for LLM inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting LLMs, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances LLMs' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4:1–43, 2012.
  3. Deep blue. Artificial intelligence, 134(1-2):57–83, 2002.
  4. Teaching large language models to self-debug, 2023.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  6. A. Creswell and M. Shanahan. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271, 2022.
  7. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704–1711, 2005.
  8. Pal: Program-aided language models, 2023.
  9. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023.
  10. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968a. doi: 10.1109/TSSC.1968.300136.
  11. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968b.
  12. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022a.
  13. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
  14. Maieutic prompting: Logically consistent reasoning with recursive explanations. arXiv preprint arXiv:2205.11822, 2022.
  15. D. Kahneman. Thinking, fast and slow. Macmillan, 2011.
  16. Representativeness revisited: Attribute substitution in intuitive judgment. Heuristics and biases: The psychology of intuitive judgment, 49(49-81):74, 2002.
  17. Language models can solve computer tasks, 2023.
  18. Llm+p: Empowering large language models with optimal planning proficiency, 2023.
  19. Neurologic a*esque decoding: Constrained text generation with lookahead heuristics. In North American Chapter of the Association for Computational Linguistics, 2021.
  20. Self-refine: Iterative refinement with self-feedback, 2023.
  21. Report on a general problem solving program. In IFIP congress, volume 256, page 64. Pittsburgh, PA, 1959.
  22. Human problem solving. Prentice-Hall, 1972.
  23. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  24. Refiner: Reasoning feedback on intermediate representations, 2023.
  25. Improving language understanding by generative pre-training. OpenAI blog, 2018.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  27. Large language model programs, 2023.
  28. Reflexion: an autonomous agent with dynamic memory and self-reflection, 2023.
  29. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  30. S. A. Sloman. The empirical case for two systems of reasoning. Psychological bulletin, 119(1):3, 1996.
  31. K. E. Stanovich. Who is rational? Studies of individual differences in reasoning. Psychology Press, 1999.
  32. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  33. Chai: A chatbot ai for task-oriented dialogue with offline reinforcement learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4471–4491, 2022.
  34. Automated crossword solving. arXiv preprint arXiv:2205.09665, 2022.
  35. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models, 2023a.
  36. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  37. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023b.
  38. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  39. Decomposition enhances reasoning via self-evaluation guided decoding, 2023.
  40. Foundation models for decision making: Problems, methods, and opportunities, 2023.
  41. ReAct: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  42. Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Lr8cOOtYbfL.
  43. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
  44. Solving math word problem via cooperative reasoning induced language models. arXiv preprint arXiv:2210.16257, 2022.
Citations (1,229)

Summary

  • The paper introduces a novel Tree of Thoughts framework that enhances LLM problem-solving by exploring multiple reasoning paths.
  • It leverages thought decomposition, generation, and state evaluation using classical search algorithms like BFS and DFS to improve task outcomes.
  • Empirical results in games, creative writing, and mini crosswords demonstrate notable performance gains over traditional chain-of-thought methods.

Tree of Thoughts: Deliberate Problem Solving with LLMs

Introduction

The paper "Tree of Thoughts: Deliberate Problem Solving with LLMs" introduces a novel framework to address limitations in current LLMs during problem-solving tasks that require exploration, strategic lookahead, or adaptive decision-making. The proposed framework, Tree of Thoughts (ToT), extends the traditional autoregressive models by employing a search tree that allows for maintaining multiple reasoning paths. The authors draw inspiration from human cognitive models, specifically the dual process theory, to introduce a method that blends fast, automated reasoning with deliberate, strategic search.

Framework Details

The ToT framework generalizes the Chain of Thought (CoT) approach, structuring problem-solving as a search over a tree where nodes represent intermediate steps or "thoughts." This framework leverages classical AI search algorithms such as BFS and DFS, optimizing decision-making by evaluating and backtracking when necessary. Figure 1

Figure 1: Schematic illustrating various approaches to problem solving with LLMs. Each rectangle box represents a thought, which is a coherent language sequence that serves as an intermediate step toward problem solving.

Key Components:

  1. Thought Decomposition: Task-specific decomposition of problems into coherent thought steps allows for diversity and flexibility in problem-solving.
  2. Thought Generation: Leveraging LLMs to generate multiple plausible continuations for a given state, either through i.i.d. sampling or sequential proposals.
  3. State Evaluation: Utilizing LLMs for heuristic evaluation of states by assigning value or voting, to inform search processes critically.
  4. Search Algorithm: Integrating BFS or DFS in the ToT framework facilitates efficient exploration and ensures adaptability across different tasks.

Experiments

Game of 24, Creative Writing, and Mini Crosswords are used to empirically validate the efficacy of ToT over traditional CoT and IO methods.

Game of 24

A mathematical challenge demanding strategic equation formulation to achieve a target number exposed the ToT's capacity in problem-solving, drastically outperforming CoT prompting by securing a significant improvement in success rate to 74%. Figure 2

Figure 2: ToT in a game of 24. The LM is prompted for (a) thought generation and (b) valuation.

Creative Writing

This task required generating coherent passages from random sentences. ToT successfully used strategic planning and self-assessment to produce more coherent narratives, as confirmed by both automated and human evaluations. Figure 3

Figure 3: A step of deliberate search in a randomly picked Creative Writing task. Given the input, the LM samples 5 different plans, then votes 5 times to decide which plan is best.

Mini Crosswords

ToT's DFS approach allowed backtracking and strategic thought censorship, achieving a word-level success rate of 60%, indicating its strength in tasks requiring systematic exploration through combinatorial problem spaces. Figure 4

Figure 4: In Mini Crosswords, (a) how thoughts are proposed and aggregated in a priority queue for DFS, and (b) how a state is evaluated based on the possibility of filling in each remaining word clue, and pruned if any remaining clue is deemed not possible to fill by the LM.

Implications and Future Directions

The ToT framework's integration of classical AI problem-solving methods with modern LLMs potentially broadens their applicability to complex decision-making and real-world applications. While demonstrating advanced effectiveness on specific difficult tasks, the framework encourages future research in enhancing LLMs' deliberative capabilities, optimizing computational resource utilization, and even fine-tuning models for thought-specific processes. The modular nature of ToT also suggests potential extensions involving hybrid models that fuse different AI paradigms.

Conclusion

The Tree of Thoughts framework significantly extends the problem-solving capabilities of LLMs by incorporating deliberate reasoning and strategic exploration. As AI systems find increasing deployment in environments demanding complex reasoning and decision-making, frameworks like ToT are poised to play a crucial role in advancing the field.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com