Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Capitalization and Punctuation Restoration: a Survey (2111.10746v1)

Published 21 Nov 2021 in cs.CL

Abstract: Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Dan Tufiş (9 papers)
  2. Vasile Păiş (11 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.