Emergent Mind

Soft Self-Consistency Improves Language Model Agents

(2402.13212)
Published Feb 20, 2024 in cs.CL , cs.AI , and cs.LG

Abstract

Generations from LLMs can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (Soft-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. Soft-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, Soft-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that Soft-SC can be applied to both open-source and black-box models.

Method improves accuracy over self-consistency in complex analysis tasks.

Overview

  • Soft-SC introduces a continuous scoring mechanism for language model (LM) agents, enhancing performance and efficiency in tasks with diverse solutions.

  • The method diversifies from traditional self-consistency by using model likelihoods for action selection, proving beneficial in sparse action spaces.

  • Empirical analysis shows Soft-SC outperforms traditional methods and scalable benefits with increased model size.

  • Soft-SC's integration of adaptive sampling and continuous scoring broadens its applicability and potential for future research in complex AI systems.

Enhancement of Language Model Agents via Soft Self-Consistency

Introduction to Soft Self-Consistency (Soft-SC)

Language model (LM) agents, when tasked with interactive or multi-step operations, commonly face challenges that can significantly affect their performance and efficiency. Traditional methods like self-consistency (SC) seek to address these by generating multiple solutions and employing majority voting to choose the final answer. However, the effectiveness of SC drops in scenarios with diverse valid solutions due to the inherent requirement for identical actions to tally votes. This paper introduces an innovative approach termed Soft Self-Consistency (Soft-SC) that transcends the limitations of exact-match scoring by integrating a continuous scoring mechanism. This method not only enhances performance but also boosts efficiency, particularly in domains with sparse action spaces. A notable achievement of Soft-SC is its ability to attain better performance with fewer sample requirements compared to SC across various tests.

Methodological Innovations

Soft-SC's Core Concept

Soft-SC diverges from SC's reliance on exact matches for scoring, instead utilizing a continuous score calculated from model likelihoods. This approach enables effective action selection among sparsely distributed options, showcasing its utility in interactive tasks with multiple valid answers per step.

Adaptive Sampling

Soft-SC incorporates an adaptive sampling strategy that dynamically adjusts the number of samples based on the convergence of scores towards a threshold. This refinement not only enhances sample efficiency but also contributes to superior task performance with a smaller sampling footprint.

Empirical Evaluations

The paper's experimental analysis reveals several key findings:

  • Soft-SC consistently outperforms SC and greedy decoding baselines across diverse interactive tasks, demonstrating substantial improvements in success rates with fewer samples.
  • Importantly, Soft-SC's benefits scale with increased model size, suggesting that larger models can further leverage this method for performance gains.
  • Additionally, Soft-SC is adaptable to both open-source and proprietary black-box models, broadening its applicability.

Practical and Theoretical Implications

Soft-SC presents a meaningful advancement in the field of LM agents, particularly for applications involving complex sequences of actions. This method's ability to efficiently handle diversity in valid actions and improve upon existing selection methodologies points to significant potential for enhancing interactive AI systems. Theoretically, Soft-SC's approach to scoring adds a new dimension to understanding how LLMs can be optimized for varied and nuanced tasks, promoting further research into continuous scoring mechanisms.

Future Directions and Considerations

The introduction of Soft-SC opens avenues for future exploration, including its integration with other AI optimization techniques and the extension to more diverse tasks beyond the ones tested. Additionally, considering its performance improvements and efficiency gains, subsequent studies could investigate Soft-SC's applicability in real-world scenarios, where LLM agents are tasked with navigating complex environments or performing intricate sequences of actions.

Conclusion

In summary, Soft Self-Consistency offers a robust and efficient method for improving the performance of language model agents across a range of interactive tasks. By addressing the limitations inherent in traditional majority voting approaches, Soft-SC provides a compelling solution that enhances both the accuracy and efficiency of LLM agents, setting a new benchmark for future developments in the field.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.