Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music (2207.07336v4)

Published 15 Jul 2022 in eess.AS, cs.SD, and eess.SP

Abstract: Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, i.e. a singing vocal extraction front end, followed by a lyrics transcriber back end, where the front end and back end are trained separately. Such a two-step pipeline suffers from both imperfect vocal extraction and mismatch between front end and back end. In this work, we propose a novel end-to-end integrated fine-tuning framework, that we call PoLyScriber, to globally optimize the vocal extractor front end and lyrics transcriber back end for lyrics transcription in polyphonic music. The experimental results show that our proposed PoLyScriber achieves substantial improvements over the existing approaches on publicly available test datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xiaoxue Gao (21 papers)
  2. Chitralekha Gupta (15 papers)
  3. Haizhou Li (286 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.