Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 39 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs (2407.03181v2)

Published 3 Jul 2024 in cs.CL

Abstract: Requiring a LLM to generate intermediary reasoning steps, known as Chain of Thought (CoT), has been shown to be an effective way of boosting performance. Previous approaches have focused on generating multiple independent CoTs, combining them through ensembling or other post-hoc strategies to enhance reasoning. In this work, we introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step, which is fundamentally different from prior work that primarily operate on parallel CoT generations. DCoT allows LLMs to gain the ability to perform within-inference refinement of reasoning chains without requiring external feedback. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales (1.3B to 70B). These improvements are particularly impactful for tasks with a large result state space, such as those involving numeric answers. Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain by generating a second, improved chain within the same inference step, demonstrating previously elusive self-improvement. Our code and data are publicly available at https://github.com/UKPLab/acl2025-diverse-cot.

Citations (4)

Summary

  • The paper introduces Divergent Chain of Thought, a fine-tuning method that enables LLMs to generate multiple reasoning paths and self-correct for enhanced accuracy.
  • The paper demonstrates that even smaller LLMs benefit from DCoT, achieving significant performance improvements across tasks including mathematics and multi-hop reasoning.
  • The paper highlights DCoT's potential to democratize AI by enabling self-correction without external feedback, thereby broadening the applicability of LLMs.

Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in LLMs

This paper introduces a novel approach to enhancing the reasoning capabilities of LLMs titled Divergent Chain of Thought (DCoT). This method builds on the foundation of Chain of Thought (CoT) prompting, which improves performance by generating intermediate reasoning steps. However, the proposed DCoT method advances this approach by facilitating the generation and comparison of multiple reasoning chains in a single inference step, thus potentially increasing accuracy in the final solutions provided by the model.

Methodology

The key innovation of the DCoT framework is in its ability to instruct LLMs to produce several divergent reasoning paths before arriving at a final decision. This is inspired by the cognitive theories of Divergent and Convergent Thinking, which suggest a multi-phase approach to problem-solving. The process involves generating numerous ideas (divergent phase) and synthesizing them to derive a single solution (convergent phase).

For implementation, DCoT requires fine-tuning models with datasets that contain multiple reasoning paths per question, allowing the model to learn how to generate and select among various potential solutions. This methodology addresses the limitation faced by prior models which could not generate multiple inference chains simultaneously due to the complexity of the task.

Results

The experimentation spanned across models with parameter sizes ranging from 1.3B to 70B, demonstrating consistent improvement over baseline CoT models. Notably, the empirical results substantiate that even smaller, more accessible LLMs benefit from this fine-tuning approach. The performance boost was significant across a variety of tasks, indicative of the method's broad applicability.

Quantitatively, the work showed improvements in task performance across various datasets, including mathematics, logic, and multi-hop reasoning tasks. Furthermore, the introduction of DCoT allowed some models to enhance their accuracy without additional external feedback, indicating a self-correcting capability—a novel advancement in the field.

Implications and Future Directions

The implications of this research are multifaceted. Practically, the introduction of DCoT empowers smaller models to achieve enhanced performance, making high-quality reasoning tasks more accessible without requiring extensive computational resources. This democratizes access to powerful AI and broadens the range of applications for which these LLMs can be effectively utilized.

Theoretically, the success of this method suggests that further exploration into divergent thinking strategies might unlock additional reasoning capabilities in LLMs. The framework presents a new paradigm where multi-step reasoning does not rely solely on external oversight or feedback loops.

Future research may explore the integration of DCoT within larger, more context-rich models or alternative reasoning paradigms such as code prompting or graph-based reasoning. Additionally, investigating the differential impacts of various scales of divergent reasoning (i.e., number of reasoning chains generated) could offer deeper insights into optimizing model training and inference strategies.

This research underscores the value of fine-tuning with complex reasoning data and sets the stage for subsequent advancements in enhancing AI reasoning through refined model training techniques.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com