Emergent Mind

Abstract

Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in the dominant language like English is superior to other languages due to the imbalance of multilingual training data. To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages, which we adopt as the preference for optimization, e.g., Direct Preference Optimization (DPO) or Proximal Policy Optimization (PPO). Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models on all three benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), with improved reasoning consistency across languages.

Accuracy across ten languages post-training MathOctopus 7B on preference datasets via various translation models.

Overview

  • The MAPO framework aims to enhance the reasoning abilities of LLMs across multiple languages by aligning reasoning processes in non-dominant languages to those in a dominant language, typically English, through preference optimization techniques.

  • The framework employs a two-stage process: first, estimating alignment using a translation model and then optimizing preferences using Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). This method does not rely on expensive and error-prone reasoning annotations.

  • Experimental results across three multilingual benchmarks (MSVAMP, MGSM, and MNumGLUESub) in 10 languages showed significant improvements in accuracy, demonstrating the framework's ability to enhance robustness and generalization in multilingual reasoning tasks.

MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

The paper "MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization" addresses the challenge of enhancing the reasoning abilities of LLMs across multiple languages. While reasoning abilities should theoretically be language-agnostic, practical inconsistencies across languages exist, notably due to the imbalance of multilingual training data. The central proposal of the work is the Multilingual-Alignment-as-Preference Optimization (MAPO) framework, which aims to align reasoning processes in non-dominant languages with those in a dominant language, typically English.

Key Contributions and Methodology

The authors propose a two-stage framework for improving multilingual reasoning:

  1. Preference Estimation via Multilingual Alignment: This stage involves leveraging an off-the-shelf translation model to estimate the alignment between the reasoning processes in dominant and non-dominant languages. The translation probability serves as an alignment score, indicating the consistency and correctness of the reasoning process.
  2. Preference Optimization: In this stage, the alignment scores are used as preferences to guide optimization. The authors employ two state-of-the-art preference optimization techniques: Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). Through iterative optimization, the model is trained to generate reasoning processes in non-dominant languages that align more closely with those in the dominant language.

Notably, the MAPO framework does not rely on reasoning annotations, which are typically expensive and error-prone, especially when translated into non-dominant languages. Instead, the alignment leverages the inherent reasoning capabilities of the dominant language model, thereby striving for improved consistency and accuracy across languages.

Experimental Results

The experiments were conducted on three multilingual reasoning benchmarks: MSVAMP, MGSM, and MNumGLUESub, encompassing 10 languages. The MAPO framework demonstrated substantial improvements across all datasets:

  • MSVAMP: MAPO improved accuracy by up to 16.2%, 6.1%, and 13.3% on the three benchmarks, respectively. These results underscore the framework's ability to achieve state-of-the-art performance, particularly on out-of-domain datasets, indicating enhanced robustness and generalization.
  • MGSM and MNumGLUESub: Similar improvements were observed, corroborating the efficacy of the framework in diverse and challenging multilingual reasoning tasks.

Implications and Future Directions

The implications of the MAPO framework are significant for both practical applications and theoretical advancements in AI. Practically, the framework offers a scalable and efficient method to enhance reasoning capabilities across languages without the need for extensive multilingual annotated datasets. Theoretically, it underscores the viability of preference optimization as an effective strategy for aligning reasoning processes and improving multilingual AI performance.

Future research could explore several avenues:

  1. Scaling to Larger Models: Investigating the impact of MAPO on larger models, such as 13B and 70B parameter LLMs, to determine its scalability and potential for even greater performance gains.
  2. Broader Range of Preference Optimization Techniques: Examining other preference optimization methods to possibly improve the effectiveness and efficiency of the MAPO framework.
  3. Real-world Applications: Implementing MAPO in real-world scenarios to assess its practical utility and to fine-tune the framework based on empirical insights.

Conclusion

The MAPO framework represents a notable advancement in the field of multilingual reasoning. By aligning the reasoning processes across languages through preference optimization techniques, the authors demonstrate significant improvements in model performance and consistency. This work paves the way for further advancements in multilingual AI, potentially leading to more robust, accurate, and language-agnostic reasoning capabilities in LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.