Remixing Music with Visual Conditioning (2010.14565v1)
Abstract: We propose a visually conditioned music remixing system by incorporating deep visual and audio models. The method is based on a state of the art audio-visual source separation model which performs music instrument source separation with video information. We modified the model to work with user-selected images instead of videos as visual input during inference to enable separation of audio-only content. Furthermore, we propose a remixing engine that generalizes the task of source separation into music remixing. The proposed method is able to achieve improved audio quality compared to remixing performed by the separate-and-add method with a state-of-the-art audio-visual source separation model.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.