Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings? (2004.05985v1)

Published 13 Apr 2020 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for better alignment of homonym embeddings and the validation of the presented method on the punctuation prediction task. We record the absolute improvement in punctuation prediction accuracy between 6.2% (for question marks) to 9% (for periods) when compared with the state-of-the-art model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Łukasz Augustyniak (14 papers)
  2. Mikołaj Morzy (11 papers)
  3. Piotr Zelasko (95 papers)
  4. Adrian Szymczak (6 papers)
  5. Jan Mizgajski (5 papers)
  6. Yishay Carmiel (7 papers)
  7. Najim Dehak (71 papers)
  8. Piotr Szymanski (1 paper)
Citations (7)

Summary

We haven't generated a summary for this paper yet.