Emergent Mind

Stream of Search (SoS): Learning to Search in Language

(2404.03683)
Published Apr 1, 2024 in cs.LG , cs.AI , and cs.CL

Abstract

Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.

Stream of Search framework overview, from problem instantiation to iterative language model improvement.

Overview

  • This paper introduces the Stream of Search (SoS) framework, teaching language models to master search strategies and backtracking through a unified language, demonstrated on the Countdown problem.

  • SoS employs a unified language to represent key search operations like exploration, backtracking, and pruning, enabling models to autonomously navigate and innovate within problem spaces.

  • Training on diverse search trajectories rather than optimal paths alone, SoS-pretrained models showed a 25% increase in search accuracy on the Countdown problem.

  • Further model improvements through policy enhancement techniques allowed models to solve an additional 36% of previously unsolvable problems, highlighting SoS's impact on model self-improvement and adaptability.

Exploring the Stream of Search: A Framework for Learning to Search within Language Models

Introduction to Stream of Search

The essence of problem-solving through language models often neglects an important aspect integral to human learning and creativity: the ability to explore, make mistakes, and learn from them. Most models are trained on a clean, mistake-free data diet, limiting their ability to preemptively recognize errors or explore alternative solutions. This paper proposes a novel framework, Stream of Search (SoS), which encapsulates the idea of teaching language models the art of search and backtracking through a unified language. The method is demonstrated on the Countdown problem, showcasing a significant improvement in solving capabilities over models trained solely on optimal paths.

Unified Language for Search

At the heart of the SoS framework is the systematic representation of search strategies in a unified language. This encompasses key search operations like exploration, backtracking, pruning, and more. By embodying these operations in language, the paper opens the door to training models that can autonomously navigate through problem spaces, engaging with different strategies and potentially inventing new ones. Such a unified language for search not only enhances a model's problem-solving toolkit but also enriches its ability to think and learn in a more human-like manner.

Training and Evaluation

The Countdown problem, chosen for its combination of simplicity and complexity, served as the proving ground for SoS. A transformer-based model was trained from scratch on a dataset of search trajectories generated by heuristic solvers employing diverse strategies. Remarkably, compared to models trained on optimal paths, the SoS pretraining demonstrated a 25% increase in search accuracy. This improvement not only speaks to the efficacy of exposing models to the intricate process of search and decision-making but also highlights the potential for models to self-improve and adapt over time.

Policy Improvement Techniques

Building upon SoS, the paper investigates the model's capacity for self-improvement through two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The resulting finetuned models exhibited an impressive ability to solve an additional 36% of problems previously unsolved, including those beyond the reach of the heuristic solvers used for initial training. This achievement underscores the profound potential of language models to transcend their initial training limits through the strategic application of policy improvement methods.

Implications and Future Directions

The Stream of Search framework reshapes our understanding of the capabilities of language models in planning, problem-solving, and learning. By training models to engage with the messy, exploratory process of search, we can unlock more dynamic and versatile problem-solving abilities. This research not only provides a tangible step towards equipping LMs with the tools for internal search and discovery but also lays the groundwork for future developments in AI that can learn, adapt, and innovate in more human-like ways.

Furthermore, the implications for practical applications are vast, ranging from enhanced problem-solving in specific domains to the development of more generalized AI capable of tackling a broader spectrum of challenges. As we continue to push the boundaries of what language models can achieve, frameworks like SoS will be instrumental in guiding their evolution towards more sophisticated and creative forms of intelligence.

Concluding Thoughts

The Stream of Search framework marks a significant advancement in the field of language model research.

By embedding the intricacies of search within the language, the framework opens new avenues for models to learn, grow, and innovate. As we look towards the future, the potential for models trained under this framework to discover entirely new search strategies or solve problems that have long evaded algorithmic solutions serves as a testament to the untapped potential residing within language models.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

Reddit
Stream of Search (SoS): Learning to Search in Language (24 points, 3 comments) in /r/singularity