Emergent Mind

Abstract

Neural machine translation systems estimate probabilities of target sentences given source sentences, yet these estimates may not align with human preferences. This work introduces QE-fusion, a method utilizing a quality estimation metric (QE) that better correlates with human judgments to synthesize improved translations. QE-fusion leverages a candidate pool sampled from a model, combining spans from different candidates using QE metrics such as CometKiwi. We compare QE-fusion against beam search and recent reranking techniques, such as Minimum Bayes Risk decoding or QE-reranking. Our method consistently improves translation quality in terms of COMET and BLEURT scores when applied to LLMs used for translation (PolyLM, XGLM, Llama2, and Mistral) and to multilingual translation models (NLLB), over five language pairs. Notably, QE-fusion exhibits larger improvements for LLMs due to their ability to generate diverse outputs. We demonstrate that our approach generates novel translations in over half of the cases and consistently outperforms other methods across varying numbers of candidates (5-200). Furthermore, we empirically establish that QE-fusion scales linearly with the number of candidates in the pool. QE-fusion proves effective in enhancing LLM-based translation without the need for costly retraining of LLMs.

QE-fusion improves BLEURT scores and synthesizes novel output candidates not seen in larger pools.

Overview

  • The paper introduces QE-fusion, an algorithm that improves neural machine translation (NMT) by combining spans from multiple translation hypotheses using quality estimation metrics.

  • An extensive evaluation demonstrates QE-fusion's superiority over traditional methods like beam search and reranking, particularly for LLMs and multilingual NMT models.

  • QE-fusion not only scales efficiently with the number of candidates but also avoids the need for retraining models, making it a viable solution for enhancing translation quality in real-world applications.

Combining Machine Translation Hypotheses Using Quality Estimation: A Formal Analysis

Introduction

The paper "Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation" by Giorgos Vernikos and Andrei Popescu-Belis proposes QE-fusion, an innovative methodology designed to enhance neural machine translation (NMT) outputs. Traditionally, NMT systems estimate the probability of target sentences based on source sentences, often leveraging beam search and reranking techniques to enhance translation quality. However, such methods exhibit limitations, especially when candidate outputs contain complementary errors.

Methodology

The central contribution of this work is QE-fusion, an algorithm that synthesizes improved translations by combining spans from multiple candidates using quality estimation metrics like CometKiwi. Unlike beam search, QE-fusion begins with a pool of candidates generated via sampling techniques (e.g., nucleus sampling for LLMs and epsilon sampling for multilingual translation models). It identifies divergent spans among these candidates and creates new hypotheses, incrementally integrating spans that contribute the highest estimated quality according to the QE metric.

Experimental Setup

The paper presents a rigorous evaluation framework encompassing multiple LLMs (e.g., PolyLM, XGLM, Llama2, ALMA, and Mistral) and multilingual NMT models (e.g., NLLB) across five language pairs, using WMT22 and Flores-200 datasets. Performance metrics include BLEU, ChrF, COMET, and BLEURT, with a particular focus on neural-based metrics due to their superior alignment with human judgment.

Key Findings

Performance Improvements

QE-fusion consistently outperforms traditional methods such as beam search and advanced reranking techniques, including Minimum Bayes Risk (MBR) and QE-reranking. Noteworthy is the observation that LLMs, due to their ability to generate diverse outputs, benefit significantly from QE-fusion. The method generates novel translations in over half the cases evaluated, indicating its ability to produce outputs that the model might not generate independently.

Scalability

The authors empirically establish that QE-fusion scales linearly with the number of candidates, a critical factor given the computational expense associated with quality estimation metrics. This computational efficiency makes QE-fusion a practical choice for real-world applications without the need for retraining the underlying translation models.

Theoretical and Practical Implications

Theoretically, QE-fusion challenges the conventional reranking paradigm by demonstrating that combining candidate spans can lead to superior translations. Practically, this approach circumvents the need for expensive retraining of LLMs, offering a more efficient path toward translation improvement. The results suggest potential applications beyond MT, such as enhancing general language generation tasks through integration with reward models from Reinforcement Learning from Human Feedback (RLHF).

Future Directions

Future research could focus on:

  1. Extending QE-fusion to Other Domains: Applying the combination strategy to diverse language generation tasks.
  2. Optimizations: Further reducing computational costs through advanced techniques such as pruning or model distillation.
  3. Handling Low-Resource Languages: Investigating the efficacy of QE-fusion in low-resource language pairs, possibly incorporating external linguistic resources to boost performance.

Conclusion

The QE-fusion algorithm represents a significant advancement in machine translation, providing a robust solution to leverage the inherent diversity in model-generated candidates. This method's consistent superiority over traditional and advanced reranking techniques underlines its potential to redefine the landscape of neural machine translation and natural language processing. As a scalable and effective approach, QE-fusion is poised to facilitate the development of more nuanced and accurate translation models without necessitating intensive retraining efforts.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.