Emergent Mind

MFCC-GAN Codec: A New AI-based Audio Coding

(2310.14300)
Published Oct 22, 2023 in eess.AS and cs.SD

Abstract

In this paper, we proposed AI-based audio coding using MFCC features in an adversarial setting. We combined a conventional encoder with an adversarial learning decoder to better reconstruct the original waveform. Since GAN gives implicit density estimation, therefore, such models are less prone to overfitting. We compared our work with five well-known codecs namely AAC, AC3, Opus, Vorbis, and Speex, performing on bitrates from 2kbps to 128kbps. MFCCGAN36k achieved the state-of-the-art result in terms of SNR despite a lower bitrate in comparison to AC3128k, AAC112k, Vorbis48k, Opus48k, and Speex48K. On the other hand, MFCCGAN13k also achieved high SNR=27 which is equal to that of AC3128k, and AAC112k while having a significantly lower bitrate (13 kbps). MFCCGAN36k achieved higher NISQA-MOS results compared to AAC48k while having a 20% lower bitrate. Furthermore, MFCCGAN13k obtained NISQAMOS= 3.9 which is much higher than AAC24k, AAC32k, AC332k, and AAC48k. For future work, we finally suggest adopting loss functions optimizing intelligibility and perceptual metrics in the MFCCGAN structure to improve quality and intelligibility simultaneously.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.