Distilling Mathematical Reasoning Capabilities into Small Language Models (2401.11864v5)

Published 22 Jan 2024 in cs.CL and cs.AI

Abstract: This work addresses the challenge of democratizing advanced LLMs by compressing their mathematical reasoning capabilities into sub-billion parameter Small LLMs (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.

Citations (5)

View on Semantic Scholar

Summary

The paper presents Equation-of-Thought Distillation (EoTD) and Mix Thoughts Distillation (MTD) frameworks that enhance mathematical reasoning in small language models.
EoTD offloads complex calculations to an external solver, achieving up to an 18.87% accuracy improvement on benchmark datasets.
MTD leverages a diverse reasoning dataset to boost accuracy by up to 42.45%, offering a scalable solution for advanced NLP on constrained systems.

Introduction

The paper introduces innovative approaches to refining Small LLMs (SLMs) with the goal of democratizing advanced LLMs. The authors propose Equation-of-Thought Distillation (EoTD) and Mix Thoughts Distillation (MTD), specifically designed to enhance mathematical reasoning in SLMs. These approaches represent a leap forward in the field of NLP, granting SLMs sophisticated capabilities previously held by computationally intense LLMs, which are often impractical for widespread usage due to their enormity.

Mathematical Reasoning

Mathematical reasoning remains an Achilles' heel for most AI systems. Recent studies acknowledge the effectiveness of Chain-of-Thought (CoT) in coaxing LLMs to provide intermediate steps for complex problems. Nonetheless, SLMs grapple to match this proficiency due to their inherent size limitations, which underscore the need for efficient distillation techniques. The authors' EoTD framework infuses mathematical reasoning into SLMs by translating problems into solvable equations without the high computational cost usually accompanied by LLMs.

Knowledge Distillation and Methodology

Knowledge Distillation, a process of transferring expertise from massive LLMs to more manageable SLMs, underpins the authors' proposed frameworks. EoTD differentiates itself by offloading calculations to an external solver, thus reducing errors and simplifying problem-solving for SLMs. To further amplify reasoning capabilities, the paper introduces MTD—a synthesis of various reasoning datasets that imbues SLMs with a wider range of reasoning strategies. MTD's diversified dataset enriches the SLMs' reasoning knowledge base, significantly sharpening their mathematical problem-solving skills.

Experimental Findings

The research rigorously evaluates EoTD and MTD across multiple SLM variants on diverse mathematical reasoning datasets. The results are compelling: EoTD markedly improves SLM reasoning, exemplified by up to an 18.87% increase in accuracy on certain datasets. Meanwhile, MTD demonstrates an even more impressive improvement, achieving up to 42.45% accuracy, indicating a 20% edge over EoTD. The paper also confirms the hypothesis that a greater volume of heterogeneous reasoning paths correlates with enhanced SLM reasoning performance.

Conclusion

These frameworks mark a paradigm shift in harnessing the vast problem-solving prowess of LLMs within a pragmatic, significantly less resource-intensive format. The paper's findings directly contribute to the objective of democratizing AI, presenting a compelling case for SLMs capable of advanced reasoning tasks within constrained computational environments. The inherent scalability of these methods suggests a bright future for SLMs, making advanced NLP technologies more universally adoptable.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Xunyu_Zhu/status/1749731349588824321