Soft Self-Consistency Improves Language Model Agents
(2402.13212)Abstract
Generations from LLMs can be improved by sampling and scoring multiple solutions to select a final answer. Current "sample and select" methods such as self-consistency (SC) rely on majority voting to score answers. However, when tasks have many distinct and valid answers, selection by voting requires a large number of samples. This makes SC prohibitively expensive for interactive tasks that involve generating multiple actions (answers) sequentially. After establishing that majority voting fails to provide consistent gains on such tasks, we demonstrate how to increase success rates by softening the scoring criterion. We introduce Soft Self-Consistency (Soft-SC), which replaces SC's discontinuous scoring with a continuous score computed from model likelihoods, allowing for selection even when actions are sparsely distributed. Soft-SC improves both performance and efficiency on long-horizon interactive tasks, requiring half as many samples as SC for comparable or better performance. For a fixed number of samples, Soft-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping (WebShop), and a 4.7% increase for an interactive household game (ALFWorld). Finally, we show that Soft-SC can be applied to both open-source and black-box models.
Overview
-
Soft-SC introduces a continuous scoring mechanism for language model (LM) agents, enhancing performance and efficiency in tasks with diverse solutions.
-
The method diversifies from traditional self-consistency by using model likelihoods for action selection, proving beneficial in sparse action spaces.
-
Empirical analysis shows Soft-SC outperforms traditional methods and scalable benefits with increased model size.
-
Soft-SC's integration of adaptive sampling and continuous scoring broadens its applicability and potential for future research in complex AI systems.
Enhancement of Language Model Agents via Soft Self-Consistency
Introduction to Soft Self-Consistency (Soft-SC)
Language model (LM) agents, when tasked with interactive or multi-step operations, commonly face challenges that can significantly affect their performance and efficiency. Traditional methods like self-consistency (SC) seek to address these by generating multiple solutions and employing majority voting to choose the final answer. However, the effectiveness of SC drops in scenarios with diverse valid solutions due to the inherent requirement for identical actions to tally votes. This paper introduces an innovative approach termed Soft Self-Consistency (Soft-SC) that transcends the limitations of exact-match scoring by integrating a continuous scoring mechanism. This method not only enhances performance but also boosts efficiency, particularly in domains with sparse action spaces. A notable achievement of Soft-SC is its ability to attain better performance with fewer sample requirements compared to SC across various tests.
Methodological Innovations
Soft-SC's Core Concept
Soft-SC diverges from SC's reliance on exact matches for scoring, instead utilizing a continuous score calculated from model likelihoods. This approach enables effective action selection among sparsely distributed options, showcasing its utility in interactive tasks with multiple valid answers per step.
Adaptive Sampling
Soft-SC incorporates an adaptive sampling strategy that dynamically adjusts the number of samples based on the convergence of scores towards a threshold. This refinement not only enhances sample efficiency but also contributes to superior task performance with a smaller sampling footprint.
Empirical Evaluations
The paper's experimental analysis reveals several key findings:
- Soft-SC consistently outperforms SC and greedy decoding baselines across diverse interactive tasks, demonstrating substantial improvements in success rates with fewer samples.
- Importantly, Soft-SC's benefits scale with increased model size, suggesting that larger models can further leverage this method for performance gains.
- Additionally, Soft-SC is adaptable to both open-source and proprietary black-box models, broadening its applicability.
Practical and Theoretical Implications
Soft-SC presents a meaningful advancement in the field of LM agents, particularly for applications involving complex sequences of actions. This method's ability to efficiently handle diversity in valid actions and improve upon existing selection methodologies points to significant potential for enhancing interactive AI systems. Theoretically, Soft-SC's approach to scoring adds a new dimension to understanding how LLMs can be optimized for varied and nuanced tasks, promoting further research into continuous scoring mechanisms.
Future Directions and Considerations
The introduction of Soft-SC opens avenues for future exploration, including its integration with other AI optimization techniques and the extension to more diverse tasks beyond the ones tested. Additionally, considering its performance improvements and efficiency gains, subsequent studies could investigate Soft-SC's applicability in real-world scenarios, where LLM agents are tasked with navigating complex environments or performing intricate sequences of actions.
Conclusion
In summary, Soft Self-Consistency offers a robust and efficient method for improving the performance of language model agents across a range of interactive tasks. By addressing the limitations inherent in traditional majority voting approaches, Soft-SC provides a compelling solution that enhances both the accuracy and efficiency of LLM agents, setting a new benchmark for future developments in the field.
Create an account to read this summary for free: