Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT (2306.17103v4)

Published 29 Jun 2023 in cs.CL, cs.SD, and eess.AS

Abstract: We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based LLM. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Le Zhuo (25 papers)
  2. Ruibin Yuan (43 papers)
  3. Jiahao Pan (13 papers)
  4. Yinghao Ma (24 papers)
  5. Ge Zhang (170 papers)
  6. Si Liu (132 papers)
  7. Roger Dannenberg (8 papers)
  8. Jie Fu (229 papers)
  9. Chenghua Lin (127 papers)
  10. Emmanouil Benetos (89 papers)
  11. Wei Xue (150 papers)
  12. Yike Guo (145 papers)
  13. Yizhi Li (43 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.