Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification (2402.01274v3)

Published 2 Feb 2024 in cs.SD, cs.LG, and eess.AS

Abstract: In recent years, self-supervised learning has excelled for its capacity to learn robust feature representations from unlabelled data. Networks pretrained through self-supervision serve as effective feature extractors for downstream tasks, including Few-Shot Learning. While the evaluation of unsupervised approaches for few-shot learning is well-established in imagery, it is notably absent in acoustics. This study addresses this gap by assessing large-scale self-supervised models' performance in few-shot audio classification. Additionally, we explore the relationship between a model's few-shot learning capability and other downstream task benchmarks. Our findings reveal state-of-the-art performance in some few-shot problems such as SpeechCommandsv2, as well as strong correlations between speech-based few-shot problems and various downstream audio tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Meta-learning in neural networks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  2. “How well do self-supervised models transfer?,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
  3. “Metaaudio: A few-shot audio classification benchmark,” in ICANN, 2022.
  4. “MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations,” in Proc. INTERSPEECH 2023, 2023.
  5. “Self-supervised speech representation learning: A review,” IEEE Journal of Selected Topics in Signal Processing, 2022.
  6. “An Unsupervised Autoregressive Model for Speech Representation Learning,” in Proc. Interspeech 2019, 2019.
  7. “Audio albert: A lite bert for self-supervised learning of audio representation,” in 2021 IEEE Spoken Language Technology Workshop (SLT), 2021.
  8. “Decoar 2.0: Deep contextualized acoustic representations with vector quantization,” 2020.
  9. “Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6419–6423.
  10. “Tera: Self-supervised learning of transformer encoder representation for speech,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021.
  11. “Wavlm: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, 2022.
  12. “wav2vec: Unsupervised pre-training for speech recognition,” Interspeech 2019, 2019.
  13. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, 2020.
  14. “Multi-task self-supervised learning for robust speech recognition,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
  15. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021.
  16. “Distilhubert: Speech representation learning by layer-wise distillation of hidden-unit bert,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
  17. “SUPERB: Speech Processing Universal PERformance Benchmark,” in Proc. Interspeech 2021, 2021.
  18. “vq-wav2vec: Self-supervised learning of discrete speech representations,” in International Conference on Learning Representations, 2019.
  19. “Vector-quantized autoregressive predictive coding,” 2020.
  20. “Non-autoregressive predictive coding for learning speech representations from local dependencies,” 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Calum Heggan (4 papers)
  2. Sam Budgett (3 papers)
  3. Timothy Hospedales (101 papers)
  4. Mehrdad Yaghoobi (17 papers)

Summary

We haven't generated a summary for this paper yet.