Emergent Mind

Abstract

LLMs have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, obtaining fine-grained feedback from the code interpreter, and engaging in self-reflection and correction. By annotating diverse interactive tool-use trajectories and employing query evolution on GSM8K and MATH datasets, we generate an instruction fine-tuning dataset called DotaMathQA with 574K query-response pairs. We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs across various in-domain and out-of-domain benchmarks. Notably, DotaMath-deepseek-7B showcases an outstanding performance of 64.8% on the competitive MATH dataset and 86.7% on GSM8K. Besides, DotaMath-deepseek-7B maintains strong competitiveness on a series of in-domain and out-of-domain benchmarks (Avg. 80.1%). Looking forward, we anticipate that the DotaMath paradigm will open new pathways for addressing intricate mathematical problems. Our code is publicly available at https://github.com/ChengpengLi1003/DotaMath.

DotaMath's decomposition and self-correction solving MATH test set problem through iterative code refinement.

Overview

  • DotaMath introduces a novel method to enhance the mathematical reasoning abilities of LLMs through decomposition of thought, code assistance, and self-correction.

  • The methodology involves a cycle of problem decomposition, Python code generation and execution, and intermediate feedback for refining solutions, supported by a substantial dataset termed DotaMathQA.

  • Evaluation results show the model's superior performance on complex tasks with significant accuracy on key mathematical benchmarks like MATH and GSM8K.

An Overview of DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

DotaMath introduces an innovative approach to enhancing the mathematical reasoning capabilities of LLMs. The core of DotaMath's methodology revolves around three pivotal strategies: decomposition of thought, intermediate process display through code assistance, and self-correction. In this review, we dissect the architecture and implications of DotaMath's approach, shedding light on its efficacy and practical significance based on benchmark evaluations.

Methodology and Dataset Construction

DotaMath leverages multiple iterations of interaction between the model and a Python code interpreter to deliver precise solutions to complex mathematical problems. The methodology can be summarized through several key phases. Firstly, DotaMath breaks down any given mathematical problem into logical subtasks (termed "decomposition of thought"), making the problem more manageable. This is followed by generating and executing Python code to solve the subtasks. Intermediate feedback is crucial as it guides the model for further analysis (termed "intermediate process display").

Data Construction and Instruction Fine-tuning

For training, the DotaMathQA dataset plays an instrumental role. The dataset is derived through comprehensive annotation processes involving human-curated datasets like GSM8K and MATH, augmented by query evolution techniques. This procedure yields a dataset composed of 574K query-response pairs, aptly termed DotaMathQA. Notably, the data includes instances of both single-turn and multi-turn QA, with the latter necessitating multiple interactions for self-correction.

Evaluation Outcomes

The DotaMath models were rigorously evaluated against both in-domain and out-of-domain benchmarks. Performance results emphasize the model's superior ability to handle complex tasks. Specifically, the DotaMath-deepseek-7B model demonstrated pronounced proficiency with 64.8% accuracy on the challenging MATH dataset and 86.7% accuracy on GSM8K. Comparatively, the model also maintained strong competitiveness with an average score of 80.1% across various benchmarks.

Practical and Theoretical Implications

The practical utility of DotaMath is multifaceted. In educational technology, DotaMath can serve as a robust tool for solving intricate mathematical problems, aiding students and educators alike. The theoretical implications are equally significant. The integration of detailed feedback through intermediate process displays ensures that the model's reasoning aligns closely with human thought processes, enhancing interpretability and reliability of the solutions generated.

Future Directions

Looking forward, DotaMath sets the stage for further advancements in mathematical reasoning for LLMs. Future research can explore optimizing the decomposition strategies and refining the self-correction mechanisms to handle even higher-level complexities. Additionally, extending this approach to interdisciplinary problem-solving across STEM fields could unleash new potentials within LLM capabilities.

Conclusion

DotaMath exemplifies a marked advancement in the quest for equipping LLMs with comprehensive mathematical reasoning capabilities. By leveraging decomposition of thought, intermediate process display, and self-correction, DotaMath transcends the limitations faced by traditional LLMs in mathematical contexts. The practical utility and theoretical advancements underscore its profound impact, reaffirming the potential for further innovations in this domain.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.