MELD-ST: An Emotion-aware Speech Translation Dataset (2405.13233v1)
Abstract: Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings, highlighting the need for further research in emotion-aware speech translation systems.
- FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023).
- Findings of the IWSLT 2022 evaluation campaign. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022).
- Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
- GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10,000 Hours of Transcribed Audio. In Proc. Interspeech 2021.
- MuST-C: a Multilingual Speech Translation Corpus. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
- Emotion Recognition in Conversations: A Survey Focusing on Context, Speaker Dependencies, and Fusion Methods. Electronics.
- Breeding gender-aware direct speech translation systems. In Proceedings of the 28th International Conference on Computational Linguistics.
- CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. In Proceedings of Language Resources and Evaluation Conference (LREC).
- MuST-cinema: a speech-to-subtitles corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference.
- CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition. In Speech and Computer.
- Direct speech-to-speech translation with discrete units. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
- EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
- MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- Robust speech recognition via large-scale weak supervision.
- AudioPaLM: A Large Language Model That Can Speak and Listen.
- SeamlessM4T: Massively Multilingual & Multimodal Machine Translation.
- Seamless: Multilingual Expressive and Streaming Speech Translation.
- BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
- Towards speech dialogue translation mediating speakers of different languages. In Findings of the Association for Computational Linguistics: ACL 2023.
- Lost in back-translation: Emotion preservation in neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics.
- CoVoST 2 and Massively Multilingual Speech Translation. In Proc. Interspeech 2021.
- Dialogs re-enacted across languages.
- ESPnet: End-to-End Speech Processing Toolkit. In Proc. Interspeech 2018.
- GigaST: A 10,000-hour Pseudo Speech Translation Corpus. In Proc. INTERSPEECH 2023.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.