Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems (2307.01310v2)
Abstract: Recent Named Entity Recognition (NER) advancements have significantly enhanced text classification capabilities. This paper focuses on spoken NER, aimed explicitly at spoken document retrieval, an area not widely studied due to the lack of comprehensive datasets for spoken contexts. Additionally, the potential for cross-lingual transfer learning in low-resource situations deserves further investigation. In our study, we applied transfer learning techniques across Dutch, English, and German using both pipeline and End-to-End (E2E) approaches. We employed Wav2Vec2 XLS-R models on custom pseudo-annotated datasets to evaluate the adaptability of cross-lingual systems. Our exploration of different architectural configurations assessed the robustness of these systems in spoken NER. Results showed that the E2E model was superior to the pipeline model, particularly with limited annotation resources. Furthermore, transfer learning from German to Dutch improved performance by 7% over the standalone Dutch E2E system and 4% over the Dutch pipeline model. Our findings highlight the effectiveness of cross-lingual transfer in spoken NER and emphasize the need for additional data collection to improve these systems.
- D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvisticae Investigationes, vol. 30, no. 1, pp. 3–26, 2007.
- N. Tomashenko, A. Caubrière, Y. Estève, A. Laurent, and E. Morin, “Recent advances in end-to-end spoken language understanding,” in International Conference on Statistical Language and Speech Processing, pp. 44–55, Springer, 2019.
- D. Porjazovski, J. Leinonen, and M. Kurimo, “Attention-based end-to-end named entity recognition from speech,” in International Conference on Text, Speech, and Dialogue, pp. 469–480, Springer, 2021.
- B. Chen, G. Xu, X. Wang, P. Xie, M. Zhang, and F. Huang, “AISHELL-NER: Named entity recognition from Chinese speech,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8352–8356, IEEE, 2022.
- S. Ghannay, A. Caubrière, Y. Estève, N. Camelin, E. Simonnet, A. Laurent, and E. Morin, “End-to-end named entity and semantic concept extraction from speech,” in Spoken Language Technology Workshop (SLT), pp. 692–699, IEEE, 2018.
- P. Haghani, A. Narayanan, M. Bacchiani, G. Chuang, N. Gaur, P. Moreno, R. Prabhavalkar, Z. Qu, and A. Waters, “From audio to semantics: Approaches to end-to-end spoken language understanding,” in Spoken Language Technology Workshop (SLT), pp. 720–726, IEEE, 2018.
- D. Serdyuk, Y. Wang, C. Fuegen, A. Kumar, B. Liu, and Y. Bengio, “Towards end-to-end spoken language understanding,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5754–5758, IEEE, 2018.
- S. Mdhaffar, J. Duret, T. Parcollet, and Y. Estève, “End-to-end model for named entity recognition from speech without paired training data,” in Annual Conference of the International Speech Communication Association (Interspeech), 2022.
- A. Caubrière, S. Rosset, Y. Estève, A. Laurent, and E. Morin, “Where are we in named entity recognition from speech?,” in Proceedings of the Language Resources and Evaluation Conference, pp. 4514–4520, 2020.
- M. A. B. Jannet, O. Galibert, M. Adda-Decker, and S. Rosset, “How to evaluate ASR output for named entity recognition?,” in Annual Conference of the International Speech Communication Association (Interspeech), 2015.
- A. Pasad, F. Wu, S. Shon, K. Livescu, and K. Han, “On the use of external data for spoken named entity recognition,” in Proceedings of the North American Chapter of the Association for Computational Linguistics, pp. 724–737, 2022.
- N. Goyal, J. Du, M. Ott, G. Anantharaman, and A. Conneau, “Larger-scale transformers for multilingual masked language modeling,” 2021.
- M. A. B. Jannet, O. Galibert, M. Adda-Decker, and S. Rosset, “Investigating the effect of asr tuning on named entity recognition,” in Annual Conference of the International Speech Communication Association (Interspeech), pp. 2486–2490, 2017.
- I. Cohn, I. Laish, G. Beryozkin, G. Li, I. Shafran, I. Szpektor, T. Hartman, A. Hassidim, and Y. Matias, “Audio de-identification-a new entity recognition task,” in Proceedings of the North American Chapter of the Association for Computational Linguistics, pp. 197–204, 2019.
- M. Hatmi, C. Jacquin, E. Morin, and S. Meignier, “Incorporating named entity recognition into the speech transcription process,” in Annual Conference of the International Speech Communication Association (Interspeech), 2013.
- D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al., “Deep Speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning (ICML), pp. 173–182, 2016.
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in International conference on machine learning (ICML), pp. 369–376, 2006.
- H. Yadav, S. Ghosh, Y. Yu, and R. R. Shah, “End-to-end named entity recognition from English speech,” Annual Conference of the International Speech Communication Association (Interspeech), pp. 4268–4272, 2020.
- S. Shon, A. Pasad, F. Wu, P. Brusco, Y. Artzi, K. Livescu, and K. J. Han, “Slue: New benchmark tasks for spoken language understanding evaluation on natural speech,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7927–7931, IEEE, 2022.
- S. Vajjala and R. Balasubramaniam, “What do we really know about state of the art ner?,” in Proceedings of the Language Resources and Evaluation Conference, pp. 5983–5993, 2022.
- A. Babu, C. Wang, A. Tjandra, K. Lakhotia, Q. Xu, N. Goyal, K. Singh, P. von Platen, Y. Saraf, J. Pino, A. Baevski, A. Conneau, and M. Auli, “Xls-r: Self-supervised cross-lingual speech representation learning at scale,” 2021.
- E. F. T. K. Sang, “Introduction to the conll-2002 shared task: Language-independent named entity recognition,” 2002.
- E. F. T. K. Sang and F. D. Meulder, “Introduction to the conll-2003 shared task: Language-independent named entity recognition,” 2003.
- C. Palen-Michel, N. Holley, and C. Lignos, “SeqScore: Addressing barriers to reproducible named entity recognition evaluation,” in Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, (Punta Cana, Dominican Republic), pp. 40–50, Association for Computational Linguistics, Nov. 2021.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2019.