Emergent Mind

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Published May 17, 2023 in cs.CL , cs.AI , and cs.LG


Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.

Game of 24 showing thought generation (a) and valuation (b) through language model (LM) prompting.


  • The paper introduces the 'Tree of Thoughts' (ToT) framework to enhance LLMs' problem-solving by structuring reasoning as a search across multiple paths.

  • ToT expands on 'Chain of Thought' (CoT) prompting, allowing for sophisticated decision-making with the exploration and evaluation of multiple reasoning paths.

  • The ToT framework includes thought decomposition, generation, evaluation, and the application of search algorithms like BFS and DFS to navigate solution trees.

  • Empirical exploration shows ToT significantly outperforms existing methods in complex reasoning, planning, and search strategy tasks, suggesting potential for broader application.

Enhancing Generative AI Problem-Solving with Tree of Thoughts (ToT)


LLMs have advanced significantly, showing capabilities beyond simple text generation to include problem-solving across various domains. However, their generative process, rooted in token-level decision making, limits their performance in tasks demanding strategic reasoning, exploration, or look-ahead functionalities. To address these limitations, we discuss the "Tree of Thoughts" (ToT) framework, which extends the "Chain of Thought" (CoT) prompting approach, allowing for more sophisticated decision-making processes by exploring and evaluating multiple reasoning paths.

Background on LLM Problem Solving

Existing LLM problem-solving methods primarily utilize Input-Output (IO) prompting, CoT prompting, and Self-consistency with CoT (CoT-SC). These methods, while effective for a range of tasks, are constrained by their linear and single-path nature, limiting their ability to handle tasks requiring complex reasoning or search strategies. The introduction of the ToT framework seeks to expand the LLM's problem-solving toolkit by enabling a more nuanced exploration of potential solutions through a structured search process.

The Tree of Thoughts (ToT) Framework

The ToT framework represents a novel approach to LLM inference by structuring the reasoning process as a search over a tree of possible solutions, where each node—a "thought"—represents a coherent language sequence leading towards problem resolution. This structure allows the LLM to evaluate and choose from multiple paths, akin to human problem-solving processes that involve exploratory search and strategic planning. Key components of ToT include:

  • Thought Decomposition: Breaking down the problem-solving process into discrete steps that facilitate generation, evaluation, and selection.
  • Thought Generation and Evaluation: Mechanisms for proposing and assessing the viability of different thoughts or paths, leveraging the LLM's generative capabilities.
  • Search Algorithms: The application of search algorithms like BFS and DFS within the ToT framework, allowing systematic exploration and evaluation of the thought tree.

Empirical Exploration

We validate the ToT framework through experiments on three novel tasks designed to test the limits of current LLM problem-solving abilities: the Game of 24, Creative Writing, and Mini Crosswords. The results demonstrate that ToT significantly outperforms existing methods like IO prompting and CoT, showcasing its potential for enhancing LLM problem-solving across tasks that require complex reasoning, planning, and search strategies.

Implications and Future Directions

The introduction of ToT opens new avenues for LLM research, emphasizing the importance of structured reasoning and strategic search in problem-solving. It highlights a path towards integrating traditional AI search methods with the generative capabilities of LLMs, offering a richer toolkit for tackling complex problems. Future work could extend the ToT framework in several directions, including optimizing search algorithms for efficiency, exploring dynamic thought generation strategies, and applying ToT in domains requiring external knowledge or real-time interaction.


ToT represents a significant step forward in the application of LLMs for problem-solving, offering a structured and systematic approach to explore multiple reasoning paths. By enabling deliberate decision-making and strategic planning, ToT broadens the scope of tasks that LLMs can effectively address, paving the way for more sophisticated AI-assisted problem-solving capabilities.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  2. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4:1–43
  3. Deep blue. Artificial intelligence, 134(1-2):57–83
  4. Teaching large language models to self-debug
  5. PaLM: Scaling Language Modeling with Pathways
  6. Faithful Reasoning Using Large Language Models
  7. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12):1704–1711
  8. Pal: Program-aided language models
  9. Reasoning with Language Model is Planning with World Model
  10. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968a. doi: 10.1109/TSSC.1968.300136.
  11. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968b.
  12. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022a
  13. Inner Monologue: Embodied Reasoning through Planning with Language Models
  14. Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
  15. D. Kahneman. Thinking, fast and slow. Macmillan
  16. Representativeness revisited: Attribute substitution in intuitive judgment. Heuristics and biases: The psychology of intuitive judgment, 49(49-81):74
  17. Language models can solve computer tasks
  18. Llm+p: Empowering large language models with optimal planning proficiency
  19. Neurologic a*esque decoding: Constrained text generation with lookahead heuristics. In North American Chapter of the Association for Computational Linguistics
  20. Self-refine: Iterative refinement with self-feedback
  21. Report on a general problem solving program. In IFIP congress, volume 256, page 64. Pittsburgh, PA
  22. Human problem solving. Prentice-Hall
  23. GPT-4 Technical Report
  24. Refiner: Reasoning feedback on intermediate representations
  25. Improving language understanding by generative pre-training. OpenAI blog
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9
  27. Large language model programs
  28. Reflexion: an autonomous agent with dynamic memory and self-reflection
  29. Mastering the game of go without human knowledge. nature, 550(7676):354–359
  30. S. A. Sloman. The empirical case for two systems of reasoning. Psychological bulletin, 119(1):3
  31. K. E. Stanovich. Who is rational? Studies of individual differences in reasoning. Psychology Press
  32. LLaMA: Open and Efficient Foundation Language Models
  33. Chai: A chatbot ai for task-oriented dialogue with offline reinforcement learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4471–4491
  34. Automated Crossword Solving
  35. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models, 2023a
  36. Self-Consistency Improves Chain of Thought Reasoning in Language Models
  37. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023b
  38. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  39. Decomposition enhances reasoning via self-evaluation guided decoding
  40. Foundation models for decision making: Problems, methods, and opportunities
  41. ReAct: Synergizing Reasoning and Acting in Language Models
  42. Planning with LLMs for code generation. In The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=Lr8cOOtYbfL.

  43. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  44. Solving Math Word Problems via Cooperative Reasoning induced Language Models

Show All 44