Papers
Topics
Authors
Recent
2000 character limit reached

Blind Acoustic Room Parameter Estimation Using Phase Features (2303.07449v1)

Published 13 Mar 2023 in eess.AS, cs.LG, and cs.SD

Abstract: Modeling room acoustics in a field setting involves some degree of blind parameter estimation from noisy and reverberant audio. Modern approaches leverage convolutional neural networks (CNNs) in tandem with time-frequency representation. Using short-time Fourier transforms to develop these spectrogram-like features has shown promising results, but this method implicitly discards a significant amount of audio information in the phase domain. Inspired by recent works in speech enhancement, we propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters, namely, volume and RT60. The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features across a wide range of acoustics spaces. We evaluate the effectiveness of the deployment of these novel features in both single-parameter and multi-parameter estimation strategies, using a novel dataset that consists of publicly available room impulse responses (RIRs), synthesized RIRs, and in-house measurements of real acoustic spaces.

Citations (10)

Summary

Blind Acoustic Room Parameter Estimation Using Phase Features

Introduction

The paper "Blind Acoustic Room Parameter Estimation Using Phase Features" (2303.07449) presents a novel approach to estimating acoustic room parameters by leveraging phase-related features in conjunction with traditional magnitude-based CNN architectures. The study addresses the challenges associated with modeling room acoustics from noisy audio recordings using single-channel inputs. It builds upon prior work by incorporating phase information, often discarded in traditional spectral feature extraction, to enhance the estimation of reverberation fingerprint parameters, specifically volume and RT60RT_{60}.

Methodology

The proposed approach utilizes a convolutional neural network (CNN) architecture to estimate the reverberation fingerprint parameters. Unlike existing methods that primarily rely on the magnitude spectrum, this work introduces phase-related features, such as the Gammatone phase spectrogram and its derivatives. The network architecture (Figure 1) remained relatively simple to maintain computational efficiency while effectively capturing relevant time-frequency patterns from the input data. Figure 1

Figure 1: A visualization of our CNN architecture from our featurization. Note that the height of each layer is dependent on the dimensionality of the input feature, which varies from experiment to experiment.

The data required for training the model was generated through an extensive multi-stage pipeline combining public room impulse response (RIR) datasets and synthetic data to cover a wide range of room volumes and RT60RT_{60} values. This approach facilitated the creation of a robust dataset essential for training a network capable of generalizing well across different acoustic environments.

Experiments and Results

The experiments conducted focused on two aspects: the effectiveness of phase-related features in improving parameter estimation accuracy, and the joint estimation of multiple room parameters from a shared feature set. Model performance was evaluated using metrics such as mean squared error (MSE) and Pearson correlation for both volume and RT60RT_{60} estimation.

Results demonstrated that models incorporating phase features achieved superior performance compared to baseline systems using magnitude-only features. Specifically, the +Phase model, which incorporates phase spectrograms and their derivatives, significantly outperformed the baseline in terms of correlation coefficients and MSE, particularly for volume estimation. Figure 2

Figure 2

Figure 2

Figure 2: Confusion Matrices of our volume estimation model using +Phase features on our train, validation, and testing dataset splits (from left to right). The dashed red line indicates a perfect prediction.

Joint estimation of room parameters was also explored, maintaining a balance of accuracy and complexity while reducing the number of required models. The joint +Continuity model, which included phase continuity features, achieved comparable results to individually optimized systems, suggesting that the shared representation effectively captured the necessary information for both volume and RT60RT_{60} estimation.

Implications

This study showcases how incorporating phase-derived features can enhance the estimation of room acoustic parameters, potentially influencing a range of applications, including audio processing, spatial sound reproduction, and augmented reality. By effectively utilizing single-channel input data, the approach significantly reduces hardware requirements, making it more accessible for real-world deployment.

Future work may explore extending these techniques to additional acoustic parameters or incorporating more complex architectures to further exploit the rich information contained within phase features. Additionally, leveraging multichannel input data could refine estimation accuracy and broaden the range of applicable acoustical scenarios.

Conclusion

The introduction of phase-related features in blind room parameter estimation marks a meaningful

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.