Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation (2305.14386v1)

Published 22 May 2023 in cs.LG, cs.AI, and cs.CL

Abstract: In this paper, we present a novel approach for distilling math word problem solving capabilities from LLMs into smaller, more efficient student models. Our approach is designed to consider the student model's weaknesses and foster a tailored learning experience by generating targeted exercises aligned with educational science principles, such as knowledge tracing and personalized learning. Concretely, we let GPT-3 be a math tutor and run two steps iteratively: 1) assessing the student model's current learning status on a GPT-generated exercise book, and 2) improving the student model by training it with tailored exercise samples generated by GPT-3. Experimental results reveal that our approach outperforms LLMs (e.g., GPT-3 and PaLM) in accuracy across three distinct benchmarks while employing significantly fewer parameters. Furthermore, we provide a comprehensive analysis of the various components within our methodology to substantiate their efficacy.

Citations (27)

Summary

  • The paper introduces CEMAL, a framework using LLM-generated tailored exercises to address specific weaknesses in math word problem solvers.
  • The methodology employs an iterative training process with problem and solution analogy strategies to enhance robustness and generalization.
  • Empirical results demonstrate state-of-the-art performance on datasets like MAWPS, ASDiv-a, and SVAMP with reduced computational overhead.

Customized Exercise Generation for Math Word Problem Solvers

Introduction

The paper "Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation" explores the integration of LLMs as pedagogical tools to enhance the learning efficacy of smaller Math Word Problem (MWP) solvers. The authors propose a framework called Customized Exercise for MAth Learning (CEMAL) which leverages LLMs to generate exercises tailored to the explicit weaknesses of the student model. This approach aims to distill the mathematical reasoning capabilities of LLMs into smaller models, achieving competitive performance with significantly less computational overhead.

Methodology

Iterative Training Framework

CEMAL employs an iterative framework wherein a student MWP solver is trained using exercises generated by an LLM, which acts as a tutor. The process involves an initial training phase using a set of augmented MWPs, followed by progressive refinement through exercises targeting specific weaknesses identified during model evaluation. Figure 1

Figure 1: This figure shows the overall iterative framework of CEMAL. After one round of training, the student, which is a small MWP solver, is evaluated by exercises provided by an LLM teacher. Subsequently, LLM generates customized exercises that target the student's knowledge state and weaknesses, thereby facilitating a customized improvement in their overall performance.

Exercise Generation Technique

The core component of CEMAL is its ability to generate exercises that are specifically tailored to address the shortcomings of the student model. This is accomplished by employing both problem analogy and solution analogy strategies. The LLM generates exercises with modified problem statements that maintain the same underlying mathematical structure, ensuring that the solver gains robustness through exposure to varied linguistic contexts.

A critical part of this approach is the generation of an 'exercise book', a diverse validation set derived from the training data. This facilitates a comprehensive evaluation of the model's strengths and weaknesses by preventing overfitting to memorized solutions.

Empirical Results

The experimental evaluation demonstrated that CEMAL achieves state-of-the-art results on multiple datasets, including MAWPS, ASDiv-a, and SVAMP. The method consistently outperformed traditional fine-tuning baselines and previous knowledge distillation techniques. Figure 2

Figure 2: Accuracies vs model sizes for representative baselines and our approach on SVAMP dataset. Our method achieves competitive performance with LLMs with significantly fewer parameters.

A particularly noteworthy outcome is CEMAL's performance in out-of-distribution (OOD) scenarios, where the student solver's accuracy greatly benefited from the custom-tailored exercises. This suggests that the model's understanding generalizes beyond the specific distribution of its initial training data.

Analysis of Strategies and Components

The paper investigates various problem generation strategies employed during training, explicitly comparing targeted versus random exercise generation. Targeted exercises, which directly address the solver's detected weaknesses, significantly enhance model performance, especially in OOD settings. Additionally, the importance of the exercise book in providing a more balanced and comprehensive evaluation is highlighted. Figure 3

Figure 3: Performance Comparison between one-time augmentation and progressive augmentation on SVAMP under out-of-distribution setting.

Conclusion

The CEMAL framework presents a novel method for utilizing LLMs in a tutorial capacity to enhance smaller models' problem-solving capabilities. By systematically generating and integrating targeted exercises, CEMAL enables student solvers to attain high accuracy with reduced computational complexity compared to LLMs alone. The promising results, particularly in generalization and robustness, suggest that this approach can be expanded to other domains where LLMs can guide the learning process of smaller, more efficient AI systems.

In future work, the exploration of autonomous exercise generation without reference problems and enhancement of the quality control mechanisms for generated problems could further refine and extend the efficacy of the CEMAL framework.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.