Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Microphone Conversion: Mitigating Device Variability in Sound Event Classification (2401.06913v1)

Published 12 Jan 2024 in cs.SD, cs.LG, cs.MM, and eess.AS

Abstract: In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method. As SEC systems become increasingly common, it is crucial that they work well with audio from diverse recording devices. Our method addresses limited device diversity in training data by enabling unpaired training to transform input spectrograms as if they are recorded on a different device. Our experiments show that our approach outperforms existing methods in generalization by 5.2% - 11.5% in weighted f1 score. Additionally, it surpasses the current methods in adaptability across diverse recording devices by achieving a 6.5% - 12.8% improvement in weighted f1 score.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. “Low-complexity acoustic scene classification in dcase 2022 challenge,” 2022.
  2. “Sound event detection in domestic environments with weakly labeled data and soundscape synthesis,” in Workshop on Detection and Classification of Acoustic Scenes and Events, New York City, United States, October 2019.
  3. “Mic2mic: using cycle-consistent generative adversarial networks to overcome microphone variability in speech systems,” in Proceedings of the 18th international conference on information processing in sensor networks, 2019, pp. 169–180.
  4. “Deep feature cyclegans: Speaker identity preserving non-parallel microphone-telephone domain adaptation for speaker verification,” arXiv preprint arXiv:2104.01433, 2021.
  5. “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
  6. “Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition.,” in LREC, 2000.
  7. Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
  8. “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
  9. “Least squares generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2794–2802.
  10. “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2107–2116.
  11. “Deep Residual Learning for Image Recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. June 2016, CVPR ’16, pp. 770–778, IEEE.
  12. “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019.
  13. “CP-JKU submission to dcase22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,” Tech. Rep., DCASE2022 Challenge, June 2022.
  14. “QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design,” Tech. Rep., DCASE2021 Challenge, June 2021.
  15. “Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification,” in Proc. Interspeech 2022, 2022, pp. 2393–2397.
  16. “Heavily augmented sound event detection utilizing weak predictions,” Tech. Rep., DCASE2021 Challenge, June 2021.
  17. “Filteraugment: An acoustic environmental data augmentation method,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4308–4312.
  18. “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech 2019, 2019, pp. 2613–2617.
  19. “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations, 2018.
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.