Emergent Mind

Abstract

This study proposes a method for knowledge distillation (KD) of fine-tuned LLMs into smaller, more efficient, and accurate neural networks. We specifically target the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model (Neural Network) using the prediction probabilities (as soft labels) of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To validate the performance of the KD approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three mathematical reasoning datasets with student-written responses graded by human experts. We compared accuracy with state-of-the-art (SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models. Results have shown that the KD approach has 1% and 4% higher scoring accuracy than ANN and TinyBERT and comparable accuracy to the teacher model. Furthermore, the student model size is 0.02M, 10,000 times smaller in parameters and x10 faster in inferencing than the teacher model and TinyBERT, respectively. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.

KD approach architecture uses soft-labels from a teacher model to train a student model.

Overview

  • The paper addresses the integration of AI and LLMs in education, focusing on personalized learning and automatic scoring.

  • Large LLM deployment is limited by their size and computational demands, affecting their use in education with lower-end hardware.

  • Knowledge distillation is used to train smaller models that replicate the performance of large LLMs with less computational power.

  • Experiments on a large dataset and additional educational datasets show the distilled models maintain accuracy while reducing size and computational load.

  • The research supports the democratization of AI in education by enabling sophisticated models to function on less powerful devices.

Introduction

The integration of AI and LLMs into education has resulted in significant advancements in personalized learning experiences. LLMs show promise in enhancing learning by providing personalized content, supporting learning, and facilitating automatic scoring systems. However, their deployment faces challenges due to their large size and computational requirements. This is particularly problematic for educational environments with hardware limitations such as mobile devices, tablets, and school-provided laptops without high-end graphics processing units or tensor processing units.

Knowledge Distillation for LLM

The study examines the distillation of knowledge from fine-tuned LLMs into smaller, more manageable models, allowing them to run on devices with fewer resources. This process involves a "teacher-student" learning approach where a smaller "student" model is trained using the predictive probabilities generated by a larger LLM, known as the "teacher" model. Advanced techniques involving specialized loss functions are used to instruct the student model not only to mimic the LLM's performance but to do so with a fraction of the computational requirements.

Empirical Validation

Researchers conducted experiments using a significant dataset, labeled 7T, containing thousands of student-written responses to science questions, along with three additional datasets in the educational domain. The smaller models trained via knowledge distillation were compared with original neural network models in terms of their accuracy in automatic scoring. The distilled student models displayed comparable accuracy to the LLM when tested on the 7T dataset and higher accuracy than original neural networks on other datasets while being considerably smaller in size and computational load.

Impact on Education

The implications of using distilled models in education are far-reaching. By enabling effective automatic scoring systems to operate on less powerful hardware, this research contributes to the democratization of AI in education. It offers a viable solution to the problem of deploying sophisticated AI models in resource-constrained environments and aligns with the increasing demand for personalized learning and adaptive assessment tools in the educational sector. The study provides a proof of concept for the successful application of knowledge distillation in educational technology, highlighting its potential to transform educational assessment practices and ensure equitable access to advanced AI technologies in typical school settings.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.