Self-Discover: Large Language Models Self-Compose Reasoning Structures (2402.03620v1)

Published 6 Feb 2024 in cs.AI and cs.CL

Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

References (50)

Citations (35)

View on Semantic Scholar

Summary

The paper presents a framework where LLMs self-compose task-specific reasoning structures, achieving up to 32% performance gains on challenging benchmarks.
It demonstrates significant computational efficiency by requiring only 3 additional inference steps compared to traditional methods needing 10-40 steps.
The approach offers interpretable and universally applicable reasoning patterns that mirror human problem-solving, enhancing cross-model adaptability.

Self-Discover: Enhancing LLM Reasoning Through Self-Discovered Reasoning Structures

Introduction to Self-Discover

LLMs have been at the forefront of producing coherent texts and following instructions with a significant level of success. These models, powered by transformers, have shown potential in various applications, including text generation and task execution. As part of the ongoing efforts to enhance LLMs' reasoning capabilities, a variety of prompting methods inspired by cognitive theories have emerged. These methods, such as Chain of Thought (CoT), decomposition-based prompting, and step-back prompting, aim to mimic human problem-solving steps or break down complex problems into smaller, manageable parts. However, these techniques often operate under the assumption of a one-size-fits-all reasoning module, disregarding the unique intrinsic structure of each task. Addressing this limitation, the "Self-Discover" framework proposes a methodology for LLMs to self-compose reasoning structures tailored to individual tasks, significantly improving reasoning performance across challenging benchmarks.

Key Contributions of Self-Discover

Enhanced Performance on Reasoning Benchmarks: Self-Discover has demonstrated substantial improvements on various challenging reasoning tasks, such as BigBench-Hard, grounded agent reasoning, and MATH, with performance gains reaching up to 32% over traditional CoT prompting methods. Additionally, it outperformed inference-intensive methods like CoT-Self-Consistency by over 20%, with significantly reduced computational demands.
Computational Efficiency: The framework's efficiency is highlighted through its modest requirement of only 3 additional inference steps at the task level, a drastic reduction compared to methods demanding 10-40 times more inference compute.
Transferability and Universality: The self-discovered reasoning structures are not only universally applicable across different model families but also exhibit similarities with human reasoning patterns. This underscores the framework's adaptability and its potential to enhance reasoning tasks across various LLM implementations.
Interpretability: By grounding the discovered reasoning structures in atomic reasoning modules, Self-Discover provides interpretable insights into LLMs’ task-solving strategies. This is a notable advantage over methods relying on less transparent optimized prompts.

Experimental Setup and Findings

The Self-Discover framework was rigorously tested across a set of 25 reasoning tasks drawn from benchmarks like BBH, T4D, and MATH. Utilizing state-of-the-art models such as GPT-4 and PaLM 2-L, the framework significantly outperformed existing prompting methods across these tasks. Notably, on the T4D task, Self-Discover achieved over 85% accuracy with GPT-4, showcasing its remarkable efficiency and effectiveness.

Implications and Future Directions

The research introduces an innovative approach to reasoning in LLMs, moving away from the reliance on generic prompting methods to a more task-specific, self-composed reasoning structures. This not only enhances the performance and efficiency of LLMs but also provides a more interpretable method of understanding model reasoning. Looking forward, the potential of Self-Discover to adapt and improve across various LLM architectures opens new avenues for research, especially in domains where reasoning and complex problem-solving are crucial. The framework's ability to mimic human reasoning patterns presents exciting opportunities for human-AI collaboration, further pushing the boundaries of what AI can achieve.

Conclusion

"Self-Discover" marks a significant step forward in LLM reasoning, offering a scalable and efficient methodology for self-composing reasoning structures. Its success across challenging benchmarks, combined with computational efficiency and the universality of its application, underscores the potential of LLMs to tackle complex reasoning tasks. As AI continues to evolve, frameworks like Self-Discover are pivotal in harnessing the true reasoning capabilities of LLMs, offering insights and directions for future research in the field.