Emergent Mind

Abstract

Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it challenging to extend this framework to diverse languages. In this work, we evaluate a simple approach for zero-shot cross-lingual alignment, where a reward model is trained on preference data in one source language and directly applied to other target languages. On summarization and open-ended dialog generation, we show that this method is consistently successful under comprehensive evaluation settings, including human evaluation: cross-lingually aligned models are preferred by humans over unaligned models on up to >70% of evaluation instances. We moreover find that a different-language reward model sometimes yields better aligned models than a same-language reward model. We also identify best practices when there is no language-specific data for even supervised finetuning, another component in alignment.

Cross-lingual reward model transfer using English RM repurposed for Spanish alignment, bypassing common monolingual methods.

Overview

  • This paper explores the potential of using a single-language reward model (RM) to align language models (LMs) across multiple languages, focusing on tasks like summarization and dialog generation.

  • The study utilizes reinforcement learning and best-of-$n$ reranking techniques, revealing that cross-lingually transferred RMs often outperform same-language RMs in terms of alignment quality.

  • Key findings indicate that RMs exhibit strong generalization capabilities across languages, and using a RM from a different source language may reduce overfitting to language-specific artifacts.

  • Future recommendations suggest leveraging RMs from high-resource languages to guide LM alignment in less resourced languages, coupled with thorough evaluations to ensure semantic and pragmatic consistency across languages.

Evaluating Zero-Shot Cross-Lingual Alignment in Language Models Using a Single-Language Reward Model

Introduction

Cross-lingual transfer of reward models (RMs) stands as a fundamental approach to facilitate language model (LM) alignment when multilingual preference data are scarce. This work investigates the efficacy of using a single-language RM to align LMs across multiple languages, offering a potential solution to the problem of scaling alignment practices to diverse language settings where specific preference data may be lacking.

Zero-Shot Cross-Lingual Transfer of Reward Models

The core methodology proposes transferring a RM trained on one source language to guide the alignment of LMs in target languages. This approach side-steps the necessity for target-specific annotated datasets by leveraging the interlingual generality of pretrained multilingual models. The paper explores two tasks: summarization and open-ended dialog generation, utilizing reinforcement learning and best-of-$n$ reranking as reward optimization techniques.

Cross-lingual effectiveness is measured through comprehensive evaluation methods, including direct human evaluation and automated metrics by larger and unbiased LMs (GPT-4 and PaLM-2-L), revealing a surprising observation. Aligned models using the transferred RM from another language often showed alignment quality surpassing models that utilized a same-language RM. This suggests that the generalization capabilities of RMs may be robust to input language changes and that certain biases tied to same-language RM might be sidestepped with source-language RMs.

Key Results and Observations

  • Generalizability of RMs: Despite being trained on data from one language, RMs were able to effectively drive alignment in different languages, with human evaluator preference reaching over 70% in favor of aligned models across various instances.
  • Comparison with Translate-Train Baseline: The RMs directly transferred cross-lingually outperformed the translate-train baseline, where the RM data was automatically translated into the target language, hinting at the strong adaptability and perhaps superior interlingual transfer capabilities of the original RMs.
  • Unexpected Superiority of Cross-lingual Alignment: In several instances, using a RM from a different language yielded better alignment than using a RM from the target language. It is hypothesized this could be due to the reduced likelihood of overfitting to language-specific artifacts present in the target-language training data.

Implications and Future Directions

The findings underscore the potential to lower the barriers for deploying multilingual LMs aligned to human preferences, especially for under-resourced languages. Cross-lingual RM transfer, by avoiding the need for extensive language-specific annotated data, could democratize the benefits of advanced LMs globally.

However, the implications of this strategy are complex. It opens questions about the extent to which language-agnostic principles of generation quality hold across different contexts and cultural nuances. Conducting further studies on tasks or domains with heavier cultural or context-specific elements could enrich our understanding of the limits of cross-lingual RM transferability.

Recommendations

For practical deployment, using RMs from a high-resource language like English to guide alignment in other languages might be an effective strategy. This strategy should ideally be complemented by rigorous evaluations and comparisons against in-language RMs to ensure that the alignment preserves the intended semantic and pragmatic properties across languages.

In conclusion, this work represents an important step towards scalable, cross-lingual alignment of LMs, though future research is necessary to refine these methods and fully understand the boundary conditions under which they operate optimally.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.