Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Interpretable Representation for Controllable Polyphonic Music Generation (2008.07122v1)

Published 17 Aug 2020 in cs.SD, cs.CL, cs.LG, and eess.AS

Abstract: While deep generative models have become the leading methods for algorithmic composition, it remains a challenging problem to control the generation process because the latent variables of most deep-learning models lack good interpretability. Inspired by the content-style disentanglement idea, we design a novel architecture, under the VAE framework, that effectively learns two interpretable latent factors of polyphonic music: chord and texture. The current model focuses on learning 8-beat long piano composition segments. We show that such chord-texture disentanglement provides a controllable generation pathway leading to a wide spectrum of applications, including compositional style transfer, texture variation, and accompaniment arrangement. Both objective and subjective evaluations show that our method achieves a successful disentanglement and high quality controlled music generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ziyu Wang (137 papers)
  2. Dingsu Wang (5 papers)
  3. Yixiao Zhang (44 papers)
  4. Gus Xia (57 papers)
Citations (62)

Summary

We haven't generated a summary for this paper yet.