Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent (2312.08926v2)

Published 14 Dec 2023 in cs.AI and cs.CL

Abstract: LLMs face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named $\bf{P}$lanner-$\bf{R}$easoner-$\bf{E}$xecutor-$\bf{R}$eflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of $12.3\%$($53.9\%\xrightarrow{}66.2\%$) on the MiniF2F, $9.2\%$ ($49.8\%\xrightarrow{}59.0\%$) on MATH, and $13.2\%$($23.2\%\xrightarrow{}35.4\%$) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces the Planner-Reasoner-Executor-Reflector (PRER) framework to improve LLMs' mathematical reasoning.
It demonstrates that MathAgent-H outperforms GPT-4 and other baselines by leveraging detailed logical decomposition and self-verification.
Empirical results highlight enhanced error identification and stable inference in complex mathematical problem sets.

Introduction

LLMs demonstrate impressive fluency in natural language understanding and generation, yet struggles persist when addressing complex mathematical problems requiring advanced parsing, domain knowledge association, multi-faceted logical reasoning, and integration. To mitigate these challenges, researchers from Shanghai Jiao Tong University delve into a novel approach that enriches LLMs with agent-based systems fine-tuned for mathematical reasoning.

Methodology

The paper introduces a framework called Planner-Reasoner-Executor-Reflector (PRER) to represent the solving process of mathematical reasoning. PRER comprises four key components: Planner and Reasoner form the crux of the logical reasoning and filtration of pertinent knowledge. Executor carries out the targeted mathematical actions, while Reflector introduces mechanisms for self-verification and correction, thus enhancing stability and fault tolerance. Both MathAgent-M, which is more aligned with the model's behavior, and MathAgent-H, which mirrors human reasoning, are evaluated across diverse mathematical benchmarks.

Performance and Analysis

The experimental results illustrate notable progress: MathAgent-H exhibits superior performance over existing baselines and the celebrated GPT-4, especially in complex problem sets. The granularity of actions within the Reasoner is a stark differentiator between the MathAgents, influencing their efficacy and collaborative dynamics. With detailed actions, MathAgent-H is able to better navigate and make more accurate inferences in complex tasks, showcasing aptitude in error identification and correction.

Conclusion

The research presents a substantial leap in modeling complex mathematical reasoning using LLM-based math agents. By systematizing the decomposition of the mathematical reasoning process and examining the integration with agent-driven frameworks, the paper not only outperforms several baselines but also paves the way for future explorations in the domain, notwithstanding certain limitations that invite continued investigation.