Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement (2302.11832v1)

Published 23 Feb 2023 in cs.SD and eess.AS

Abstract: Monaural speech enhancement has been widely studied using real networks in the time-frequency (TF) domain. However, the input and the target are naturally complex-valued in the TF domain, a fully complex network is highly desirable for effectively learning the feature representation and modelling the sequence in the complex domain. Moreover, phase, an important factor for perceptual quality of speech, has been proved learnable together with magnitude from noisy speech using complex masking or complex spectral mapping. Many recent studies focus on either complex masking or complex spectral mapping, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former) using joint complex masking and complex spectral mapping for monaural speech enhancement. In D2Former, we extend the conformer network into the complex domain and form a dual-path complex TF self-attention architecture for effectively modelling the complex-valued TF sequence. We further boost the TF feature representation in the encoder and the decoders using a dual-path learning structure by exploiting complex dilated convolutions on time dependency and complex feedforward sequential memory networks (CFSMN) for frequency recurrence. In addition, we improve the performance boundaries of complex masking and complex spectral mapping by combining the strengths of the two training targets into a joint-learning framework. As a consequence, D2Former takes fully advantages of the complex-valued operations, the dual-path processing, and the joint-training targets. Compared to the previous models, D2Former achieves state-of-the-art results on the VoiceBank+Demand benchmark with the smallest model size of 0.87M parameters.

Citations (12)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.