From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning (2403.08525v2)
Abstract: We propose an adaptive change point detection method (A-CPD) for machine guided weak label annotation of audio recording segments. The goal is to maximize the amount of information gained about the temporal activations of the target sounds. For each unlabeled audio recording, we use a prediction model to derive a probability curve used to guide annotation. The prediction model is initially pre-trained on available annotated sound event data with classes that are disjoint from the classes in the unlabeled dataset. The prediction model then gradually adapts to the annotations provided by the annotator in an active learning loop. We derive query segments to guide the weak label annotator towards strong labels, using change point detection on these probabilities. We show that it is possible to derive strong labels of high quality with a limited annotation budget, and show favorable results for A-CPD when compared to two baseline query segment strategies.
- Q. Kong, Y. Xu, W. Wang, and M. D. Plumbley, “Sound event detection of weakly labelled data with cnn-transformer and automatic threshold optimization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2450–2460, 2020.
- S. Hershey, D. P. Ellis, E. Fonseca, A. Jansen, C. Liu, R. C. Moore, and M. Plakal, “The benefit of temporally-strong labels in audio event classification,” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 366–370, 2021.
- T. A. Marques, L. Thomas, S. W. Martin, D. K. Mellinger, J. A. Ward, D. J. Moretti, D. Harris, and P. L. Tyack, “Estimating animal population density using passive acoustics,” Biological Reviews, vol. 88, no. 2, pp. 287–309, 2013. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/brv.12001
- I. Martin-Morato, M. Harju, and A. Mesaros, “Crowdsourcing Strong Labels for Sound Event Detection,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 246–250, 2021.
- I. Martin-Morato and A. Mesaros, “Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 31, pp. 902–914, 2023.
- Z. Shuyang, T. Heittola, and T. Virtanen, “Active learning for sound event classification by clustering unlabeled data,” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 751–755, 2017.
- ——, “An active learning method using clustering and committee-based sample selection for sound event classification,” 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings, pp. 116–120, 2018.
- ——, “Active Learning for Sound Event Detection,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 28, pp. 2895–2905, 2020.
- Y. Wang, M. Cartwright, and J. P. Bello, “Active Few-Shot Learning for Sound Event Detection,” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1551–1555, 2022.
- I. Nolasco, S. Singh, E. Vidana-Villa, E. Grout, J. Morford, M. Emmerson, F. Jensens, H. Whitehead, I. Kiskin, A. Strandburg-Peshkin, L. Gill, H. Pamula, V. Lostanlen, V. Morfi, and D. Stowell, “Few-shot bioacoustic event detection at the DCASE 2022 challenge,” no. November, pp. 1–5, 2022.
- I. Trowitzsch, J. Taghia, Y. Kashef, and K. Obermayer, “NIGENS general sound events database,” Feb. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.2535878
- A. Diment, A. Mesaros, T. Heittola, and T. Virtanen, “TUT Rare sound events, Development dataset,” Jan. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.401395
- J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A library for soundscape synthesis and augmentation,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 344–348, 2017.
- A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event detection,” Applied Sciences (Switzerland), vol. 6, no. 6, 2016.
- J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in Neural Information Processing Systems, pp. 4078–4088, 2017.
- S. Kahl, C. M. Wood, M. Eibl, and H. Klinck, “BirdNET: A deep learning solution for avian diversity monitoring,” Ecological Informatics, vol. 61, no. January, p. 101236, 2021.
- Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
- John Martinsson (7 papers)
- Olof Mogren (18 papers)
- Maria Sandsten (2 papers)
- Tuomas Virtanen (112 papers)