Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MediaSpeech: Multilanguage ASR Benchmark and Dataset (2103.16193v1)

Published 30 Mar 2021 in eess.AS and cs.SD

Abstract: The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Rostislav Kolobov (1 paper)
  2. Olga Okhapkina (1 paper)
  3. Olga Omelchishina (1 paper)
  4. Andrey Platunov (1 paper)
  5. Roman Bedyakin (3 papers)
  6. Vyacheslav Moshkin (1 paper)
  7. Dmitry Menshikov (3 papers)
  8. Nikolay Mikhaylovskiy (10 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.