Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Statistical and Neural Methods for Cross-lingual Entity Label Mapping in Knowledge Graphs (2206.08709v1)

Published 17 Jun 2022 in cs.CL and cs.LG

Abstract: Knowledge bases such as Wikidata amass vast amounts of named entity information, such as multilingual labels, which can be extremely useful for various multilingual and cross-lingual applications. However, such labels are not guaranteed to match across languages from an information consistency standpoint, greatly compromising their usefulness for fields such as machine translation. In this work, we investigate the application of word and sentence alignment techniques coupled with a matching algorithm to align cross-lingual entity labels extracted from Wikidata in 10 languages. Our results indicate that mapping between Wikidata's main labels stands to be considerably improved (up to $20$ points in F1-score) by any of the employed methods. We show how methods relying on sentence embeddings outperform all others, even across different scripts. We believe the application of such techniques to measure the similarity of label pairs, coupled with a knowledge base rich in high-quality entity labels, to be an excellent asset to machine translation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Gabriel Amaral (7 papers)
  2. Mārcis Pinnis (10 papers)
  3. Inguna Skadiņa (2 papers)
  4. Odinaldo Rodrigues (11 papers)
  5. Elena Simperl (40 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.