Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls (2402.09508v3)

Published 14 Feb 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Controllable music generation plays a vital role in human-AI music co-creation. While LLMs have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. This approach enables autoregressive LLMs to seamlessly address music inpainting tasks. Additionally, our method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. The source codes and a demo page showcasing our work are available at https://kikyo-16.github.io/AIR.

References (30)

Citations (8)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Yixiao_Zhang_/status/1758355929953915152

https://twitter.com/ArxivSound/status/1758356581543002114

https://twitter.com/_EyesofTruth_/status/1765521391208562696

https://twitter.com/gastronomy/status/1758356526245216292

Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls (2402.09508v3)

Summary

Related Papers

Tweets