Svarah: Evaluating English ASR Systems on Indian Accents (2305.15760v1)
Abstract: India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents. Svarah as well as all our code will be publicly available.
- Tahir Javed (9 papers)
- Sakshi Joshi (4 papers)
- Vignesh Nagarajan (2 papers)
- Sai Sundaresan (3 papers)
- Janki Nawale (3 papers)
- Abhigyan Raman (5 papers)
- Kaushal Bhogale (6 papers)
- Pratyush Kumar (44 papers)
- Mitesh M. Khapra (79 papers)