Emergent Mind

Abstract

Rapidly growing numbers of multilingual news consumers pose an increasing challenge to news recommender systems in terms of providing customized recommendations. First, existing neural news recommenders, even when powered by multilingual language models (LMs), suffer substantial performance losses in zero-shot cross-lingual transfer (ZS-XLT). Second, the current paradigm of fine-tuning the backbone LM of a neural recommender on task-specific data is computationally expensive and infeasible in few-shot recommendation and cold-start setups, where data is scarce or completely unavailable. In this work, we propose a news-adapted sentence encoder (NaSE), domain-specialized from a pretrained massively multilingual sentence encoder (SE). To this end, we construct and leverage PolyNews and PolyNewsParallel, two multilingual news-specific corpora. With the news-adapted multilingual SE in place, we test the effectiveness of (i.e., question the need for) supervised fine-tuning for news recommendation, and propose a simple and strong baseline based on (i) frozen NaSE embeddings and (ii) late click-behavior fusion. We show that NaSE achieves state-of-the-art performance in ZS-XLT in true cold-start and few-shot news recommendation.

Ranking performance of LFRec-SCL with fine-tuned NaSE and LaBSE embeddings across English and 14 languages.

Overview

  • The paper addresses the challenges of multilingual news recommendation using neural news recommender systems (NNRs) enhanced with multilingual sentence embeddings (SEs), focusing on issues like performance degradation in zero-shot cross-lingual transfer (ZS-XLT) scenarios and computational demands of fine-tuning language models in low-data settings.

  • Contributions include the creation of a news-adapted sentence encoder (NaSE) derived from a pretrained multilingual SE and the development of two multilingual news corpora (PolyNews and PolyNewsParallel) to improve recommendation quality using frozen embeddings and late click-behavior fusion.

  • Evaluation results indicate that NaSE consistently outperforms other methods in both full-data and few-shot scenarios, demonstrating robustness and efficiency in cross-lingual news recommendation without extensive fine-tuning.

Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation

The study presents a focused investigation into the challenges of recommending multilingual news articles using neural news recommender systems (NNRs) enhanced with multilingual sentence embeddings (SEs). The authors highlight two main challenges: the performance degradation in zero-shot cross-lingual transfer (ZS-XLT) scenarios and the computational infeasibility of fine-tuning backbone language models (LMs) in low-data environments such as few-shot recommendation and cold-start setups.

Contributions

The key contributions of the paper include the development of a news-adapted sentence encoder (NaSE) derived from a pretrained massively multilingual SE, and the construction of two multilingual news-specific corpora: PolyNews and PolyNewsParallel. The authors propose a simplified, yet robust, baseline for news recommendation utilizing frozen NaSE embeddings combined with late click-behavior fusion.

Methodology

News-Adapted Sentence Encoder (NaSE)

The authors initiate NaSE from a general-purpose multilingual SE, LaBSE, and specialize it using denoising autoencoding (DAE) and machine translation (MT) objectives on the PolyNews and PolyNewsParallel corpora. Four distinct training strategies for NaSE are explored: DAE, MT, a combined DAE+MT, and sequential DAE followed by MT (NaSE\textsubscript{SEQ}).

Training Data and Process

PolyNews consists of approximately 3.9 million multilingual news texts across 77 languages. PolyNewsParallel, on the other hand, contains around 5.4 million news translations across 833 language pairs. The training data distribution is adjusted for language resource levels to ensure balanced learning. The NaSE variants are trained for 50,000 steps with a learning rate of 3e-5 using AdamW optimizer, with validation setup based on cross-lingual news recommendation tasks leveraging the xMIND dataset, which translates English MIND into 14 languages.

Evaluation

Neural News Recommenders (NNRs)

Seven diverse NNR architectures are evaluated:

  1. NAML
  2. MINS
  3. CAUM
  4. MANNeR
  5. LFRec-CE
  6. LFRec-SCL
  7. CAT (text-agnostic as baseline)

Results

The evaluation on the small variants of MIND and xMIND reveals that while SE-based NNRs outperform text-agnostic baselines, integrating NaSE as the NE results in superior performance over fine-tuned LaBSE and non-specialized multilingual LMs, especially when the NE remains frozen.

Key numerical results include:

  • NaSE achieves an nDCG@10 of 39.01% in English and 38.23% averaged across 14 xMIND languages in frozen NE configurations, illustrating the efficacy of news-specific domain adaptation.
  • NaSE consistently shows reduced performance losses in ZS-XLT scenarios compared to LaBSE, with relative improvements in ranking metrics such as MRR and nDCG@10.

The detailed assessment of few-shot learning scenarios (10, 50, and 100 shots) further underscores NaSE's robustness, where it consistently outperforms LaBSE, especially in extreme low-data setups.

Implications and Future Directions

Practically, this research highlights the feasibility of using robust domain-adapted SEs without extensive and computationally expensive fine-tuning. Theoretically, it opens new avenues for leveraging pretrained multilingual models for domain-specific tasks through task-agnostic adaptation strategies.

Future research may explore expanding NaSE's language coverage and enhancing the domain adaptation process using larger and more diverse news corpora. Investigating the integration of external user behavior or contextual signals could also further improve the accuracy and relevance of multilingual news recommendations.

The findings set a precedent for next-generation multilingual news recommenders, emphasizing efficiency and cross-lingual capability critical for real-world applications where resource constraints and language diversity pose significant challenges.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.