Emergent Mind

Abstract

LLMs have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited enhanced reasoning capabilities in tackling complex reasoning tasks, owing to the utilization of a method named ``Chain-of-Thought (CoT) prompting''. This method is designed to generate intermediate reasoning steps that guide the inference of the final answer. However, it is essential to highlight that these advanced reasoning abilities appear to emerge in models with a minimum of 10 billion parameters, thereby limiting its efficacy in situations where computational resources are constrained. In this paper, we investigate the possibility of transferring the reasoning capabilities of LLMs to smaller models via knowledge distillation. Specifically, we propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. This method enables a more efficient use of rationales during the answer inference stage, leading to improved performance on scientific question-answering tasks. Utilizing Sci-CoT, our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.

Sci-CoT framework: rationale generation and answer inference, outperforming the previous one-stage method.

Overview

  • The Sci-CoT framework is introduced to transfer reasoning capabilities from LLMs to smaller models, enhancing their performance in scientific QA tasks through a two-stage knowledge distillation process.

  • In the first stage, a large teacher model generates rationales for questions, which are used to fine-tune a smaller student model. This smaller model then conducts answer inference tasks in the second stage.

  • Experimental results demonstrate the effectiveness of Sci-CoT, achieving notable performance improvements over traditional methods and showing significant advancements even with reduced training data.

Sci-CoT: Leveraging LLMs for Enhanced Knowledge Distillation in Small Models for Scientific QA

Introduction

The study at hand presents a novel approach to enhancing the reasoning capabilities of smaller language models in the domain of scientific question-answering (QA) tasks by leveraging the strengths of LLMs through knowledge distillation. Conventional methods have demonstrated that the reasoning abilities of LLMs, typically enabled through "Chain-of-Thought (CoT)" prompting, are not efficacious for models with fewer than 10 billion parameters. To address this limitation, the authors introduce Sci-CoT, a two-stage framework designed to transfer reasoning capabilities from LLMs to smaller models, enhancing their performance on scientific QA tasks.

Methodology

The proposed Sci-CoT framework separates the processes of rationale generation and answer inference, effectively utilizing two distinct models for these stages. Initially, a LLM serves as the "teacher" to generate detailed rationales for given questions. The generated rationales are then employed to fine-tune a smaller "student" model to produce these rationales. In the subsequent stage, the smaller model, now equipped with the ability to generate rationales, is used to perform the answer inference tasks.

The modeling architecture leverages GPT-3.5-turbo as the teacher model and Flan-T5-small as the student model. The rationale generation process is guided by specific prompts, including the correct answer and step-by-step reasoning cues, which aim to generate accurate rationales. The two-stage framework substantially enhances the reasoning performance of the small model, as opposed to conventional fine-tuning and one-stage methods.

Experimental Evaluation

The efficacy of the Sci-CoT framework was validated using the ARC-Easy and ARC-Challenge datasets. In both datasets, the Sci-CoT approach demonstrated notable improvements over traditional fine-tuning and the one-stage method that simultaneously generates rationales and infers answers. Notably, in the ARC-Easy dataset, Sci-CoT achieved a performance improvement from 38.04% to 43.73%, surpassing the baseline fine-tune method. In contrast, the ARC-Challenge dataset showed comparable results, highlighting the robustness of Sci-CoT even in more challenging settings.

Comparison with Existing Models

The experimental results showed that an 80-million parameter model using the Sci-CoT framework outperformed larger models such as OPT-175B and BLOOM-176B in few-shot settings. This outcome emphasizes that smaller, efficiently-trained models can compete with, and sometimes exceed, the performance of significantly larger models, thus presenting a viable solution for environments with limited computational resources.

Analysis of Data Size Utilization

A critical component of the study involved investigating the relationship between model performance and the size of the training dataset. The analysis indicated that Sci-CoT achieved superior performance using only 50% of the training data required by conventional fine-tuning methods to reach the same efficacy. This finding underscores the efficiency of the Sci-CoT framework in training smaller models with fewer data, which has practical implications for more resource-constrained scenarios.

Implications and Future Directions

The findings offer significant implications for the development of small, resource-efficient models capable of high-performance reasoning tasks typically reserved for larger LLMs. By demonstrating that smaller models can effectively learn complex reasoning skills through a structured knowledge distillation framework, Sci-CoT paves the way for the deployment of powerful yet compact models in environments where computational resources or data availability are constrained.

Future research could extend the application of this framework to other reasoning tasks, including symbolic and logical reasoning, to further validate its generalizability. Additionally, exploring the integration of diverse LLMs for constructing robust, general-purpose models through advanced distillation techniques remains a vital area of interest.

Conclusion

The Sci-CoT framework represents a significant advancement in the transfer of reasoning capabilities from LLMs to smaller counterparts through a structured, two-stage knowledge distillation process. By effectively enabling smaller models to perform high-level reasoning tasks in scientific QA, this approach opens new avenues for efficient and powerful model deployment in a variety of practical applications. The potential for broader applicability and further enhancements underscores the value and promise of this innovative method.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.