Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semi-supervsied Learning-based Sound Event Detection using Freuqency Dynamic Convolution with Large Kernel Attention for DCASE Challenge 2023 Task 4 (2306.06461v1)

Published 10 Jun 2023 in eess.AS and cs.SD

Abstract: This report proposes a frequency dynamic convolution (FDY) with a large kernel attention (LKA)-convolutional recurrent neural network (CRNN) with a pre-trained bidirectional encoder representation from audio transformers (BEATs) embedding-based sound event detection (SED) model that employs a mean-teacher and pseudo-label approach to address the challenge of limited labeled data for DCASE 2023 Task 4. The proposed FDY with LKA integrates the FDY and LKA module to effectively capture time-frequency patterns, long-term dependencies, and high-level semantic information in audio signals. The proposed FDY with LKA-CRNN with a BEATs embedding network is initially trained on the entire DCASE 2023 Task 4 dataset using the mean-teacher approach, generating pseudo-labels for weakly labeled, unlabeled, and the AudioSet. Subsequently, the proposed SED model is retrained using the same pseudo-label approach. A subset of these models is selected for submission, demonstrating superior F1-scores and polyphonic SED score performance on the DCASE 2023 Challenge Task 4 validation dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ji Won Kim (5 papers)
  2. Sang Won Son (2 papers)
  3. Yoonah Song (2 papers)
  4. Hong Kook Kim (9 papers)
  5. Il Hoon Song (1 paper)
  6. Jeong Eun Lim (3 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.