Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification (2309.08200v4)

Published 15 Sep 2023 in cs.SD and eess.AS

Abstract: Recent studies focus on developing efficient systems for acoustic scene classification (ASC) using convolutional neural networks (CNNs), which typically consist of consecutive kernels. This paper highlights the benefits of using separate kernels as a more powerful and efficient design approach in ASC tasks. Inspired by the time-frequency nature of audio signals, we propose TF-SepNet, a CNN architecture that separates the feature processing along the time and frequency dimensions. Features resulted from the separate paths are then merged by channels and directly forwarded to the classifier. Instead of the conventional two dimensional (2D) kernel, TF-SepNet incorporates one dimensional (1D) kernels to reduce the computational costs. Experiments have been conducted using the TAU Urban Acoustic Scene 2022 Mobile development dataset. The results show that TF-SepNet outperforms similar state-of-the-arts that use consecutive kernels. A further investigation reveals that the separate kernels lead to a larger effective receptive field (ERF), which enables TF-SepNet to capture more time-frequency features.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. “Acoustic scene classification: Classifying environments from the sounds they produce,” IEEE Signal Processing Magazine, vol. 32, no. 3, 2015.
  2. “Low-complexity acoustic scene classification in DCASE 2022 Challenge,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, 2022.
  3. Jakob Abeßer, “A review of deep learning based methods for acoustic scene classification,” Applied Sciences, vol. 10, no. 6, 2020.
  4. “Receptive-field-regularized CNN variants for acoustic scene classification,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, 2019, pp. 124–128.
  5. “A two-stage approach to device-robust acoustic scene classification,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 845–849.
  6. “Acoustic scene classification based on a large-margin factorized CNN,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, 2019, pp. 45–49.
  7. “Broadcasted Residual Learning for Efficient Keyword Spotting,” in Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2021, pp. 4538–4542.
  8. “Multi-scale architecture and device-aware data-random-drop based fine-tuning method for acoustic scene classification,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, 2022.
  9. “Low-complexity acoustic scene classification using time frequency separable convolution,” Electronics, vol. 11, no. 17, 2022.
  10. “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6848–6856.
  11. “Environmental sound classification using temporal-frequency attention based convolutional neural network,” Scientific Reports, vol. 11, no. 1, 2021.
  12. “A multi-device dataset for urban acoustic scene classification,” in Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop, 2018, pp. 9–13.
  13. “Understanding the effective receptive field in deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 29, 2016.
  14. “DCASE2023 task1 submission: Device simulation and time-frequency separable convolution for acoustic scene classification,” Tech. Rep., Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge, 2023.
  15. “CP-JKU submission to DCASE22: Distilling knowledge for low-complexity convolutional neural networks from a patchout audio transformer,” Tech. Rep., Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge, 2022.
  16. “Accurate, large minibatch SGD: Training imagenet in 1 hour,” arXiv preprint arXiv:1706.02677, 2017.
  17. “SGDR: Stochastic gradient descent with warm restarts,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017.
  18. “mixup: Beyond empirical risk minimization,” in Proceedings of the International Conference on Learning Representations (ICLR), 2018.
  19. “Scaling up your kernels to 31x31: Revisiting large kernel design in CNNs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11963–11975.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com