Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark (2207.13005v2)

Published 26 Jul 2022 in cs.CL

Abstract: Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhenran Xu (12 papers)
  2. Zifei Shan (16 papers)
  3. Yuxin Li (36 papers)
  4. Baotian Hu (67 papers)
  5. Bing Qin (186 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.