MuPT: A Generative Symbolic Music Pretrained Transformer (2404.06393v4)

Published 9 Apr 2024 in cs.SD, cs.AI, and eess.AS

Abstract: In this paper, we explore the application of LLMs to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the challenges associated with misaligned measures from different tracks during generation, we propose the development of a Synchronized Multi-Track ABC Notation (SMT-ABC Notation), which aims to preserve coherence across multiple musical tracks. Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set. Furthermore, we explore the implications of the Symbolic Music Scaling Law (SMS Law) on model performance. The results indicate a promising direction for future research in music generation, offering extensive resources for community-led research through our open-source contributions.

References (48)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces MuPT, a transformer model that leverages SMT-ABC notation for synchronized measure alignment across multiple tracks.
It employs a decoder-only architecture with extended token capacity (up to 8192 tokens) and a 50,000-token YouTokenToMe BPE vocabulary optimized for symbolic music.
Empirical results show superior performance with structurally coherent compositions, and the release of open-source checkpoints promotes further research.

MuPT: Pioneering Symbolic Music Generation with Pretrained Transformers

Introduction to MuPT

The proliferation of LLMs has extended beyond text to diverse domains like music, where structured data representation and coherence across multiple tracks play a critical role in determining the quality of generated outputs. This paper introduces MuPT, a series of highly specialized models engineered for symbolic music generation. Unlike conventional approaches that struggle with MIDI's complex structural representation, MuPT leverages ABC Notation and a novel Synchronized Multi-Track ABC Notation (SMT-ABC Notation) to maintain measure alignment across tracks, significantly enhancing music's structural integrity and quality.

Challenges in Symbolic Music Generation

Traditional model architectures and data representations face substantial hurdles in generating coherent and structurally sound music. The predominant use of MIDI in symbolic music modeling often results in models failing to capture the essential structural symmetry that characterizes aesthetically pleasing compositions. This paper identifies and addresses these challenges by:

Proposing a transformer decoder-only architecture tailored for symbolic music generation tasks.
Introducing a synchronized approach to handle multiple music tracks, ensuring accurate measure alignment across various parts of a composition.

MuPT Architecture and Innovations

MuPT embodies several technical innovations to optimize performance for music generation tasks:

Extended Token Capacity: Models are capable of handling up to 8192 tokens, covering a vast majority of symbolic music compositions.
SMT-ABC Notation: This novel notation system is specifically designed to address the misalignment of measures across different tracks, fostering improved learning outcomes and music quality.
Advanced Tokenizer Implementation: Utilizing the YouTokenToMe framework with a 50,000-token BPE vocabulary optimized for ABC notation, ensuring efficient and effective model interpretation of symbolic music data.

Scaling Law Insights

The exploration of the Symbolic Music Scaling (SMS) Law offers a groundbreaking perspective on model performance in the context of music generation:

Comprehensive Training Benefits: The SMS Law reveals that extended training on repetitive data can lead to significant performance improvements.
Optimal Resource Allocation: Insights from the SMS Law guide the allocation of computational resources, ensuring models achieve the best possible outcomes within existing constraints.

Empirical Validation and Community Contributions

Empirical results demonstrate MuPT's superior performance compared to existing baselines. The models achieve remarkable success in generating music that is both structurally coherent and aesthetically pleasing. Furthermore, the paper commits to open-sourcing intermediate training checkpoints and foundational models to stimulate further research and innovation in symbolic music modeling.

Future Directions and Conclusion

MuPT's introduction marks a significant advancement in symbolic music generation, addressing longstanding challenges and setting a new standard for model performance in this domain. The insights garnered from the SMS Law and the open-source contribution of foundational models poised for community advancement underscore the potential for continued progress in music generation. As the community delves deeper into optimizing and extending MuPT's capabilities, the future of symbolic music modeling looks promising, with the potential to unlock new levels of creativity and intricacy in automated music composition.

PDF Markdown

Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1777872742433317230

https://twitter.com/_akhaliq/status/1777876173667611126

https://twitter.com/nicolaus625/status/1778374683944988686

https://twitter.com/abc43992899/status/1778105143936106992

https://twitter.com/knishimae0531/status/1777983531115008333

https://twitter.com/gm8xx8/status/1777873371432112603