Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning (2403.08525v2)

Published 13 Mar 2024 in cs.SD, cs.LG, and eess.AS

Abstract: We propose an adaptive change point detection method (A-CPD) for machine guided weak label annotation of audio recording segments. The goal is to maximize the amount of information gained about the temporal activations of the target sounds. For each unlabeled audio recording, we use a prediction model to derive a probability curve used to guide annotation. The prediction model is initially pre-trained on available annotated sound event data with classes that are disjoint from the classes in the unlabeled dataset. The prediction model then gradually adapts to the annotations provided by the annotator in an active learning loop. We derive query segments to guide the weak label annotator towards strong labels, using change point detection on these probabilities. We show that it is possible to derive strong labels of high quality with a limited annotation budget, and show favorable results for A-CPD when compared to two baseline query segment strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Q. Kong, Y. Xu, W. Wang, and M. D. Plumbley, “Sound event detection of weakly labelled data with cnn-transformer and automatic threshold optimization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2450–2460, 2020.
  2. S. Hershey, D. P. Ellis, E. Fonseca, A. Jansen, C. Liu, R. C. Moore, and M. Plakal, “The benefit of temporally-strong labels in audio event classification,” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 366–370, 2021.
  3. T. A. Marques, L. Thomas, S. W. Martin, D. K. Mellinger, J. A. Ward, D. J. Moretti, D. Harris, and P. L. Tyack, “Estimating animal population density using passive acoustics,” Biological Reviews, vol. 88, no. 2, pp. 287–309, 2013. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/brv.12001
  4. I. Martin-Morato, M. Harju, and A. Mesaros, “Crowdsourcing Strong Labels for Sound Event Detection,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 246–250, 2021.
  5. I. Martin-Morato and A. Mesaros, “Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 31, pp. 902–914, 2023.
  6. Z. Shuyang, T. Heittola, and T. Virtanen, “Active learning for sound event classification by clustering unlabeled data,” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 751–755, 2017.
  7. ——, “An active learning method using clustering and committee-based sample selection for sound event classification,” 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings, pp. 116–120, 2018.
  8. ——, “Active Learning for Sound Event Detection,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 28, pp. 2895–2905, 2020.
  9. Y. Wang, M. Cartwright, and J. P. Bello, “Active Few-Shot Learning for Sound Event Detection,” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1551–1555, 2022.
  10. I. Nolasco, S. Singh, E. Vidana-Villa, E. Grout, J. Morford, M. Emmerson, F. Jensens, H. Whitehead, I. Kiskin, A. Strandburg-Peshkin, L. Gill, H. Pamula, V. Lostanlen, V. Morfi, and D. Stowell, “Few-shot bioacoustic event detection at the DCASE 2022 challenge,” no. November, pp. 1–5, 2022.
  11. I. Trowitzsch, J. Taghia, Y. Kashef, and K. Obermayer, “NIGENS general sound events database,” Feb. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.2535878
  12. A. Diment, A. Mesaros, T. Heittola, and T. Virtanen, “TUT Rare sound events, Development dataset,” Jan. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.401395
  13. J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A library for soundscape synthesis and augmentation,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 344–348, 2017.
  14. A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event detection,” Applied Sciences (Switzerland), vol. 6, no. 6, 2016.
  15. J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in Neural Information Processing Systems, pp. 4078–4088, 2017.
  16. S. Kahl, C. M. Wood, M. Eibl, and H. Klinck, “BirdNET: A deep learning solution for avian diversity monitoring,” Ecological Informatics, vol. 61, no. January, p. 101236, 2021.
  17. Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. John Martinsson (7 papers)
  2. Olof Mogren (18 papers)
  3. Maria Sandsten (2 papers)
  4. Tuomas Virtanen (112 papers)

Summary

We haven't generated a summary for this paper yet.