Searching For Music Mixing Graphs: A Pruning Approach (2406.01049v2)

Published 3 Jun 2024 in cs.SD

Abstract: Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available processors to every chain. Then, after the initial console parameter optimization, we alternate between removing redundant processors and fine-tuning. We achieve this through differentiable implementation of both processors and pruning. Consequently, we find a sparse mixing graph that achieves nearly identical matching quality of the full mixing console. We apply this procedure to dry-mix pairs from various datasets and collect graphs that also can be used to train neural networks for music mixing applications.

Citations (1)

View on Semantic Scholar

Summary

The paper proposes a method using a differentiable mixing console combined with iterative pruning to accurately reverse-engineer music mixing graphs.
It optimizes a chain of seven processors and employs a hybrid sampling method to balance computational cost with effective sparsity.
Evaluations on MedleyDB, MixingSecrets, and internal datasets show significant audio loss reduction while maintaining mix quality.

Searching For Music Mixing Graphs: A Pruning Approach

Overview

The paper presents a method to reverse engineer music mixing processes using input and output audio data. Traditional music mixing involves experts combining multiple audio processors to achieve a cohesive mix. The proposed method aims to replicate this by creating a "mixing console" that applies all available processors to every chain, followed by iterative pruning and fine-tuning to remove redundant processors while maintaining high matching quality.

Methodology

Mixing Console Creation:
- A mixing console is constructed by applying a fixed processing chain, comprising seven types of processors (gain/panning, stereo imager, equalizer, reverb, compressor, noisegate, and multitap delay), to each source track.
- The outputs are then subgrouped and processed again to obtain a final mix.
- All processors are implemented in a differentiable manner to facilitate end-to-end optimization using gradient descent.
Iterative Pruning:
- After optimizing the initial mixing console parameters, an iterative pruning approach is applied to remove redundant processors.
- The goal is to achieve a sparse mixing graph that closely matches the original mix.
- The pruning process alternates between removing nodes and fine-tuning remaining parameters.
- Three candidate sampling methods for pruning are explored: brute-force, dry/wet, and hybrid. The hybrid method, combining the strengths of both brute-force and dry/wet approaches, is chosen as the default for its balance of computational cost and sparsity.
Data and Implementation:
- The method is evaluated using datasets such as MedleyDB, MixingSecrets, and an internal dataset comprising Western music mixes.
- The graphs and parameters are optimized using an audio-domain loss, incorporating multi-resolution STFT and gain-staging losses.
- Optimization is performed over multiple steps, with early steps involving full console parameter training and subsequent steps focusing on pruning and fine-tuning.

Results

Mixing consoles achieve a base audio loss of 19.7, which is significantly reduced when applying the gain/panning and stereo imager processors, and further reduced with additional processors, resulting in a final loss of 0.409.
Pruning results reveal an average audio loss increase to 0.422 for the hybrid method, with a substantial pruning ratio of 0.67. This indicates that a significant portion of processors can be pruned without greatly affecting the match quality.
The MedleyDB and MixingSecrets datasets report similar pruning ratios, while the internal dataset shows higher sparsity, likely due to the larger initial number of processors.

Discussion

The proposed approach leverages the compositional nature of music mixing to develop a structured method for estimating processing graphs and their parameters. By focusing on iterative pruning and differentiable processing, the method provides interpretable and efficient solutions for matching reference mixes.

Implications

Practical Applications:
- This method can be used to train neural networks for automatic mixing, enhancing the capabilities of music production software.
- The pruned graphs can serve as pseudo-label data for training models that aim to simplify and automate the mixing process.
Theoretical Implications:
- The approach elucidates the compositional aspects of music mixing and provides a systematic method for exploring processing graphs.
- The introduction of differentiable processors and iterative pruning can inspire further research into efficient and scalable methods for complex audio processing tasks.
Future Developments:
- Extending the method to include additional processor types and more sophisticated implementations could improve match quality and computational efficiency.
- Research into alternative search methods, such as reinforcement learning or more flexible graph structures, has the potential to enhance the versatility and performance of the proposed approach.

Conclusion

The paper introduces a robust method for reverse engineering music mixing graphs, utilizing a combination of differentiable processing and iterative pruning. Through detailed analyses and evaluations, it demonstrates the feasibility and effectiveness of the approach, providing a valuable contribution to the field of automatic music mixing and audio signal processing. The collected graphs and parameters can significantly advance the development of intelligent music production tools.

Related Papers

Tweets

https://twitter.com/csteinmetz1/status/1799477556389003292

https://twitter.com/marcoamaram/status/1800088166462103900