ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving (2309.17452v4)

Published 29 Sep 2023 in cs.CL and cs.AI

Abstract: LLMs have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.

References (48)

Citations (103)

View on Semantic Scholar

Summary

The paper introduces ToRA, a novel agent that integrates step-by-step natural language reasoning with program execution for advanced math problem solving.
The methodology, which includes curated tool-use trajectories and imitation learning, delivers 13%-19% improvements over previous models.
The results highlight open-source successes, with ToRA-7B achieving 44.6% accuracy and ToRA-Code-34B surpassing 50% on benchmark math datasets.

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

The paper "ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving" addresses the challenges faced by open-source LLMs in advanced mathematical reasoning tasks. The authors introduce ToRA, which stands for Tool-integrated Reasoning Agents, a series of models that integrate natural language reasoning with program-based tool use. This combination aims to leverage the semantic and abstract reasoning capabilities of LLMs alongside the precise computational abilities of external tools.

Approach

The authors developed ToRA by enhancing open-source models to interleave natural language reasoning with program-based tool use. This method draws from two primary approaches:

Rationale-Based Methods: Step-by-step natural language reasoning.
Program-Based Methods: Solving tasks by synthesizing and executing programs.

ToRA aims to synergize these methods by generating comprehensive annotations (interactive tool-use trajectories) for mathematical problems and applying imitation learning on these annotations. The key components of the approach include:

Curating Tool-Use Trajectories: Using GPT-4 to generate high-quality trajectories for mathematical problems from datasets like GSM8k and MATH.
Imitation Learning: Training models on curated datasets to understand and utilize interactive tool-use trajectories.
Output Space Shaping: Enhancing the model's ability to explore diverse valid trajectories through additional training on sampled and corrected outputs.

Experimental Results

ToRA models were evaluated on ten diverse mathematical reasoning datasets. The results indicated significant performance improvements over previous state-of-the-art models. Key findings include:

Significant Improvements: ToRA models showed 13%-19% absolute improvements on average compared to existing open-source models.
Exceptional Performance: ToRA-7B achieved 44.6% accuracy on the competition-level MATH dataset, which is a 22% absolute improvement over the best previous open-source model, WizardMath-70B.
Open-Source Achievements: ToRA-Code-34B became the first open-source model to surpass 50% accuracy on the MATH dataset, competing closely with GPT-4’s performance.

Implications

The results suggest several important implications for AI and mathematical problem solving:

Synergistic Reasoning: Integrating natural language reasoning with program-based tool use can significantly enhance the problem-solving capabilities of LLMs, especially in complex domains like mathematics.
Training Strategies: Imitation learning combined with output space shaping presents a promising approach to training more flexible and capable models.
Open-Source Advantages: Achieving state-of-the-art performance with open-source models opens new avenues for widespread access and research in mathematical reasoning.

Future Directions

This research paves the way for exploring several future directions in the field of AI and mathematical problem solving:

Enhanced Tool Use: Expanding the range of external tools and improving the integration mechanism could further increase the models' performance.
Generalization: Understanding and overcoming the remaining challenges in generalization to out-of-distribution tasks.
Complex Reasoning: Developing methods to handle even more complex reasoning steps, including diagram understanding and multi-step problem solving.
Interactive Learning: Introducing more dynamic interaction protocols during training to simulate more realistic problem-solving scenarios.

Overall, ToRA's development and the accompanying results highlight the substantial potential of combining various reasoning strategies to enhance the capabilities of AI models in specialized domains. This research sets a strong foundation for future advancements in AI-driven mathematical reasoning.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/ToRA: ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools. (844 stars)

Tweets

https://twitter.com/_lewtun/status/1814958635732140336

https://twitter.com/zebgou/status/1811277157123207512

https://twitter.com/Jose_A_Alonso/status/1760572944831353243

https://twitter.com/bensorcier/status/1814503373346894044

YouTube

Show All Videos