Emergent Mind

Distilling Mathematical Reasoning Capabilities into Small Language Models

(2401.11864)
Published Jan 22, 2024 in cs.CL and cs.AI

Abstract

This work addresses the challenge of democratizing advanced LLMs by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental findings demonstrate that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.

Overview

  • The paper introduces Equation-of-Thought Distillation (EoTD) and Mix Thoughts Distillation (MTD) to improve mathematical reasoning in Small Language Models (SLMs).

  • These methods aim to enhance the capabilities of SLMs without the high computational costs associated with LLMs.

  • EoTD transfers the ability to solve equations from LLMs to SLMs with the assistance of an external solver.

  • MTD combines various reasoning datasets to provide SLMs with a diversified range of reasoning strategies for better performance.

  • Empirical results show EoTD and MTD significantly increase the mathematical problem-solving accuracy of SLMs.

Introduction

The study introduces innovative approaches to refining Small Language Models (SLMs) with the goal of democratizing advanced LLMs. The authors propose Equation-of-Thought Distillation (EoTD) and Mix Thoughts Distillation (MTD), specifically designed to enhance mathematical reasoning in SLMs. These approaches represent a leap forward in the field of NLP, granting SLMs sophisticated capabilities previously held by computationally intense LLMs, which are often impractical for widespread usage due to their enormity.

Mathematical Reasoning

Mathematical reasoning remains an Achilles' heel for most AI systems. Recent studies acknowledge the effectiveness of Chain-of-Thought (CoT) in coaxing LLMs to provide intermediate steps for complex problems. Nonetheless, SLMs grapple to match this proficiency due to their inherent size limitations, which underscore the need for efficient distillation techniques. The authors' EoTD framework infuses mathematical reasoning into SLMs by translating problems into solvable equations without the high computational cost usually accompanied by LLMs.

Knowledge Distillation and Methodology

Knowledge Distillation, a process of transferring expertise from massive LLMs to more manageable SLMs, underpins the authors' proposed frameworks. EoTD differentiates itself by offloading calculations to an external solver, thus reducing errors and simplifying problem-solving for SLMs. To further amplify reasoning capabilities, the study introduces MTD—a synthesis of various reasoning datasets that imbues SLMs with a wider range of reasoning strategies. MTD's diversified dataset enriches the SLMs' reasoning knowledge base, significantly sharpening their mathematical problem-solving skills.

Experimental Findings

The research rigorously evaluates EoTD and MTD across multiple SLM variants on diverse mathematical reasoning datasets. The results are compelling: EoTD markedly improves SLM reasoning, exemplified by up to an 18.87% increase in accuracy on certain datasets. Meanwhile, MTD demonstrates an even more impressive improvement, achieving up to 42.45% accuracy, indicating a 20% edge over EoTD. The study also confirms the hypothesis that a greater volume of heterogeneous reasoning paths correlates with enhanced SLM reasoning performance.

Conclusion

These frameworks mark a paradigm shift in harnessing the vast problem-solving prowess of LLMs within a pragmatic, significantly less resource-intensive format. The study's findings directly contribute to the objective of democratizing AI, presenting a compelling case for SLMs capable of advanced reasoning tasks within constrained computational environments. The inherent scalability of these methods suggests a bright future for SLMs, making advanced NLP technologies more universally adoptable.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.