Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse? (1704.04441v1)

Published 14 Apr 2017 in cs.CL

Abstract: This paper investigates the robustness of NLP against perturbed word forms. While neural approaches can achieve (almost) human-like accuracy for certain tasks and conditions, they often are sensitive to small changes in the input such as non-canonical input (e.g., typos). Yet both stability and robustness are desired properties in applications involving user-generated content, and the more as humans easily cope with such noisy or adversary conditions. In this paper, we study the impact of noisy input. We consider different noise distributions (one type of noise, combination of noise types) and mismatched noise distributions for training and testing. Moreover, we empirically evaluate the robustness of different models (convolutional neural networks, recurrent neural networks, non-neural models), different basic units (characters, byte pair encoding units), and different NLP tasks (morphological tagging, machine translation).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Georg Heigold (9 papers)
  2. Günter Neumann (9 papers)
  3. Josef van Genabith (43 papers)
Citations (61)

Summary

We haven't generated a summary for this paper yet.