Blind Acoustic Room Parameter Estimation Using Phase Features
Introduction
The paper "Blind Acoustic Room Parameter Estimation Using Phase Features" (2303.07449) presents a novel approach to estimating acoustic room parameters by leveraging phase-related features in conjunction with traditional magnitude-based CNN architectures. The study addresses the challenges associated with modeling room acoustics from noisy audio recordings using single-channel inputs. It builds upon prior work by incorporating phase information, often discarded in traditional spectral feature extraction, to enhance the estimation of reverberation fingerprint parameters, specifically volume and RT60​.
Methodology
The proposed approach utilizes a convolutional neural network (CNN) architecture to estimate the reverberation fingerprint parameters. Unlike existing methods that primarily rely on the magnitude spectrum, this work introduces phase-related features, such as the Gammatone phase spectrogram and its derivatives. The network architecture (Figure 1) remained relatively simple to maintain computational efficiency while effectively capturing relevant time-frequency patterns from the input data.
Figure 1: A visualization of our CNN architecture from our featurization. Note that the height of each layer is dependent on the dimensionality of the input feature, which varies from experiment to experiment.
The data required for training the model was generated through an extensive multi-stage pipeline combining public room impulse response (RIR) datasets and synthetic data to cover a wide range of room volumes and RT60​ values. This approach facilitated the creation of a robust dataset essential for training a network capable of generalizing well across different acoustic environments.
Experiments and Results
The experiments conducted focused on two aspects: the effectiveness of phase-related features in improving parameter estimation accuracy, and the joint estimation of multiple room parameters from a shared feature set. Model performance was evaluated using metrics such as mean squared error (MSE) and Pearson correlation for both volume and RT60​ estimation.
Results demonstrated that models incorporating phase features achieved superior performance compared to baseline systems using magnitude-only features. Specifically, the +Phase model, which incorporates phase spectrograms and their derivatives, significantly outperformed the baseline in terms of correlation coefficients and MSE, particularly for volume estimation.


Figure 2: Confusion Matrices of our volume estimation model using +Phase features on our train, validation, and testing dataset splits (from left to right). The dashed red line indicates a perfect prediction.
Joint estimation of room parameters was also explored, maintaining a balance of accuracy and complexity while reducing the number of required models. The joint +Continuity model, which included phase continuity features, achieved comparable results to individually optimized systems, suggesting that the shared representation effectively captured the necessary information for both volume and RT60​ estimation.
Implications
This study showcases how incorporating phase-derived features can enhance the estimation of room acoustic parameters, potentially influencing a range of applications, including audio processing, spatial sound reproduction, and augmented reality. By effectively utilizing single-channel input data, the approach significantly reduces hardware requirements, making it more accessible for real-world deployment.
Future work may explore extending these techniques to additional acoustic parameters or incorporating more complex architectures to further exploit the rich information contained within phase features. Additionally, leveraging multichannel input data could refine estimation accuracy and broaden the range of applicable acoustical scenarios.
Conclusion
The introduction of phase-related features in blind room parameter estimation marks a meaningful