Domain Information Control at Inference Time for Acoustic Scene Classification (2306.08010v1)
Abstract: Domain shift is considered a challenge in machine learning as it causes significant degradation of model performance. In the Acoustic Scene Classification task (ASC), domain shift is mainly caused by different recording devices. Several studies have already targeted domain generalization to improve the performance of ASC models on unseen domains, such as new devices. Recently, the Controllable Gate Adapter ConGater has been proposed in Natural Language Processing to address the biased training data problem. ConGater allows controlling the debiasing process at inference time. ConGater's main advantage is the continuous and selective debiasing of a trained model, during inference. In this work, we adapt ConGater to the audio spectrogram transformer for an acoustic scene classification task. We show that ConGater can be used to selectively adapt the learned representations to be invariant to device domain shifts such as recording devices. Our analysis shows that ConGater can progressively remove device information from the learned representations and improve the model generalization, especially under domain shift conditions (e.g. unseen devices). We show that information removal can be extended to both device and location domain. Finally, we demonstrate ConGater's ability to enhance specific device performance without further training.
- M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018.
- B. Kim, S. Yang, J. Kim, H. Park, J. Lee, and S. Chang, “Domain generalization with relaxed instance frequency-wise normalization for multi-device acoustic scene classification,” in Interspeech, 2022.
- D. Jin, Z. Jin, Z. Hu, O. Vechtomova, and R. Mihalcea, “Deep learning for text style transfer: A survey,” Computational Linguistics, vol. 48, no. 1, pp. 155–205, 2022.
- I. Martín-Morató, F. Paissan, A. Ancilotto, T. Heittola, A. Mesaros, E. Farella, A. Brutti, and T. Virtanen, “Low-complexity acoustic scene classification in DCASE 2022 challenge,” in Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, DCASE 2022, Nancy, France, November 3-4, 2022, 2022.
- K. Koutini, F. Henkel, H. Eghbal-zadeh, and G. Widmer, “CP-JKU submissions to DCASE’20: Low-complexity cross-device acoustic scene classification with rf-regularized cnns,” Tech. Rep., DCASE2020 Challenge, 2020.
- K. Koutini, J. Schlüter, H. Eghbal-zadeh, and G. Widmer, “Efficient training of audio transformers with patchout,” Interspeech, 2022.
- F. Schmid, S. Masoudian, K. Koutini, and G. Widmer, “Knowledge distillation from transformers for low-complexity acoustic scene classification,” in Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), Nancy, France, November 2022.
- J.-H. Lee, J.-H. Choi, P. M. Byun, and J.-H. Chang, “Multi-scale architecture and device-aware data-random-drop based fine-tuning method for acoustic scene classification,” in Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), Nancy, France, November 2022.
- K. Drossos, P. Magron, and T. Virtanen, “Unsupervised adversarial domain adaptation based on the wasserstein distance for acoustic scene classification,” in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019.
- S. Gharib, K. Drossos, E. Cakir, D. Serdyuk, and T. Virtanen, “Unsupervised adversarial domain adaptation for acoustic scene classification,” in Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, DCASE 2018, Surrey, UK, November 19-20, 2018, M. D. Plumbley, C. Kroos, J. P. Bello, G. Richard, D. P. W. Ellis, and A. Mesaros, Eds., 2018.
- S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer learning in natural language processing,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials. Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 15–18.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
- D. Kashyap, S. K. Aithal, C. Rakshith, and N. Subramanyam, “Towards domain adversarial methods to mitigate texture bias,” in ICML 2022: Workshop on Spurious Correlations, Invariance and Stability.
- D. Kumar, O. Lesota, G. Zerveas, D. Cohen, C. Eickhoff, M. Schedl, and N. Rekabsaz, “Parameter-efficient modularised bias mitigation via adapterfusion,” in Proceeding of 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023.
- H. Yuan, J. Zheng, Q. Ye, Y. Qian, and Y. Zhang, “Improving fake news detection with domain-adversarial and graph-attention neural network,” Decision Support Systems, vol. 151, p. 113633, 2021.
- N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” in International Conference on Machine Learning, vol. 97. Proceedings of Machine Learning Research, 09–15 Jun 2019.
- A. Lauscher, T. Lueken, and G. Glavaš, “Sustainable modular debiasing of language models,” in Findings of the Association for Computational Linguistics: EMNLP 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021.
- L. Hauzenberger and N. Rekabsaz, “Parameter efficient diff pruning for bias mitigation,” arXiv preprint arXiv:2205.15171, 2022.
- J. M. Meissner, S. Sugawara, and A. Aizawa, “Debiasing masks: A new framework for shortcut mitigation in NLU,” in Proceeding of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.
- S. Masoudian, O. Lesota, M. Schedl, and N. Rekabsaz, “Controllable attribute removal for continuous bias mitigation at inference time,” 2023, preprint under review. [Online]. Available: https://openreview.net/pdf?id=aQ1gwltnlP
- F. Schmid, K. Koutini, and G. Widmer, “Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation,” arXiv preprint arXiv:2211.04772, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in ICASSP, 2017.
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2019.
- Shahed Masoudian (7 papers)
- Khaled Koutini (20 papers)
- Markus Schedl (48 papers)
- Gerhard Widmer (144 papers)
- Navid Rekabsaz (31 papers)