Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages (2212.09651v4)

Published 19 Dec 2022 in cs.CL

Abstract: Multilingual Pretrained LLMs (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

References (43)

Authors (4)

Ercong Nie (25 papers)
Sheng Liang (11 papers)
Helmut Schmid (20 papers)
Hinrich Schütze (250 papers)

Citations (18)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages (2212.09651v4)

Summary

Related Papers