Emergent Mind

MuPT: A Generative Symbolic Music Pretrained Transformer

(2404.06393)
Published Apr 9, 2024 in cs.SD , cs.AI , and eess.AS

Abstract

In this paper, we explore the application of LLMs to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.

Synchronized multi-track ABC notation merges music segments from bars with same index for alignment.

Overview

  • MuPT introduces a highly specialized transformer model for symbolic music generation, leveraging ABC Notation and a novel Synchronized Multi-Track ABC Notation for enhanced structural integrity.

  • The paper identifies challenges in symbolic music generation, such as the structural alignment of tracks and proposes solutions including a transformer decoder-only architecture and synchronized track handling.

  • Technical innovations in MuPT include an extended token capacity, a novel SMT-ABC Notation system, and an advanced tokenizer implementation optimized for ABC notation.

  • Empirical evidence shows MuPT's superior performance in generating structurally coherent and aesthetically pleasing music, with a commitment to open-source intermediate training checkpoints for community research advancement.

MuPT: Pioneering Symbolic Music Generation with Pretrained Transformers

Introduction to MuPT

The proliferation of LLMs has extended beyond text to diverse domains like music, where structured data representation and coherence across multiple tracks play a critical role in determining the quality of generated outputs. This paper introduces MuPT, a series of highly specialized models engineered for symbolic music generation. Unlike conventional approaches that struggle with MIDI's complex structural representation, MuPT leverages ABC Notation and a novel Synchronized Multi-Track ABC Notation (SMT-ABC Notation) to maintain measure alignment across tracks, significantly enhancing music's structural integrity and quality.

Challenges in Symbolic Music Generation

Traditional model architectures and data representations face substantial hurdles in generating coherent and structurally sound music. The predominant use of MIDI in symbolic music modeling often results in models failing to capture the essential structural symmetry that characterizes aesthetically pleasing compositions. This paper identifies and addresses these challenges by:

  • Proposing a transformer decoder-only architecture tailored for symbolic music generation tasks.
  • Introducing a synchronized approach to handle multiple music tracks, ensuring accurate measure alignment across various parts of a composition.

MuPT Architecture and Innovations

MuPT embodies several technical innovations to optimize performance for music generation tasks:

  • Extended Token Capacity: Models are capable of handling up to 8192 tokens, covering a vast majority of symbolic music compositions.
  • SMT-ABC Notation: This novel notation system is specifically designed to address the misalignment of measures across different tracks, fostering improved learning outcomes and music quality.
  • Advanced Tokenizer Implementation: Utilizing the YouTokenToMe framework with a 50,000-token BPE vocabulary optimized for ABC notation, ensuring efficient and effective model interpretation of symbolic music data.

Scaling Law Insights

The exploration of the Symbolic Music Scaling (SMS) Law offers a groundbreaking perspective on model performance in the context of music generation:

  • Comprehensive Training Benefits: The SMS Law reveals that extended training on repetitive data can lead to significant performance improvements.
  • Optimal Resource Allocation: Insights from the SMS Law guide the allocation of computational resources, ensuring models achieve the best possible outcomes within existing constraints.

Empirical Validation and Community Contributions

Empirical results demonstrate MuPT's superior performance compared to existing baselines. The models achieve remarkable success in generating music that is both structurally coherent and aesthetically pleasing. Furthermore, the paper commits to open-sourcing intermediate training checkpoints and foundational models to stimulate further research and innovation in symbolic music modeling.

Future Directions and Conclusion

MuPT's introduction marks a significant advancement in symbolic music generation, addressing longstanding challenges and setting a new standard for model performance in this domain. The insights garnered from the SMS Law and the open-source contribution of foundational models poised for community advancement underscore the potential for continued progress in music generation. As the community delves deeper into optimizing and extending MuPT's capabilities, the future of symbolic music modeling looks promising, with the potential to unlock new levels of creativity and intricacy in automated music composition.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.