Emergent Mind

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

(2312.08926)
Published Dec 14, 2023 in cs.AI and cs.CL

Abstract

LLMs face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named $\bf{P}$lanner-$\bf{R}$easoner-$\bf{E}$xecutor-$\bf{R}$eflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of $12.3\%$($53.9\%\xrightarrow{}66.2\%$) on the MiniF2F, $9.2\%$ ($49.8\%\xrightarrow{}59.0\%$) on MATH, and $13.2\%$($23.2\%\xrightarrow{}35.4\%$) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

Overview

  • Researchers introduced a framework called PRER to enhance LLMs for mathematical reasoning.

  • PRER consists of Planner, Reasoner, Executor, and Reflector components to solve math problems.

  • Two versions of MathAgent are created: MathAgent-M for model behavior and MathAgent-H for human-like reasoning.

  • MathAgent-H outperforms existing models and GPT-4 in complex mathematical problems.

  • The study demonstrates advances in LLM-based math agents and suggests directions for future research.

Introduction

LLMs demonstrate impressive fluency in natural language understanding and generation, yet struggles persist when addressing complex mathematical problems requiring advanced parsing, domain knowledge association, multi-faceted logical reasoning, and integration. To mitigate these challenges, researchers from Shanghai Jiao Tong University delve into a novel approach that enriches LLMs with agent-based systems fine-tuned for mathematical reasoning.

Methodology

The study introduces a framework called Planner-Reasoner-Executor-Reflector (PRER) to represent the solving process of mathematical reasoning. PRER comprises four key components: Planner and Reasoner form the crux of the logical reasoning and filtration of pertinent knowledge. Executor carries out the targeted mathematical actions, while Reflector introduces mechanisms for self-verification and correction, thus enhancing stability and fault tolerance. Both MathAgent-M, which is more aligned with the model's behavior, and MathAgent-H, which mirrors human reasoning, are evaluated across diverse mathematical benchmarks.

Performance and Analysis

The experimental results illustrate notable progress: MathAgent-H exhibits superior performance over existing baselines and the celebrated GPT-4, especially in complex problem sets. The granularity of actions within the Reasoner is a stark differentiator between the MathAgents, influencing their efficacy and collaborative dynamics. With detailed actions, MathAgent-H is able to better navigate and make more accurate inferences in complex tasks, showcasing aptitude in error identification and correction.

Conclusion

The research presents a substantial leap in modeling complex mathematical reasoning using LLM-based math agents. By systematizing the decomposition of the mathematical reasoning process and examining the integration with agent-driven frameworks, the study not only outperforms several baselines but also paves the way for future explorations in the domain, notwithstanding certain limitations that invite continued investigation.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.