Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation (2204.09259v1)

Published 20 Apr 2022 in cs.CL and cs.AI

Abstract: Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies. Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Cheonbok Park (20 papers)
  2. Hantae Kim (3 papers)
  3. Ioan Calapodescu (12 papers)
  4. Hyunchang Cho (4 papers)
  5. Vassilina Nikoulina (28 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.