Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 214 tok/s Pro
GPT OSS 120B 470 tok/s Pro
Claude Sonnet 4 40 tok/s Pro
2000 character limit reached

Simple and Effective Masked Diffusion Language Models (2406.07524v2)

Published 11 Jun 2024 in cs.CL, cs.AI, and cs.LG

Abstract: While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in LLMing. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked LLMing losses -- and can be used to train encoder-only LLMs that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional LLM. On LLMing benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We provide the code, along with a blog post and video tutorial on the project page: https://s-sahoo.com/mdlm

Citations (24)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a novel masked diffusion model using SUBS training that combines scratch training with fine-tuning for improved performance.
  • It employs a selective masking strategy that effectively learns relationships between masked and unmasked tokens for language modeling.
  • Experimental results demonstrate that the fine-tuned model outperforms competitors on genomic tasks, with notable accuracy gains on datasets like Mouse Enhancers and Human NonTATA Promoters.

Simple and Effective Masked Diffusion LLMs

This paper introduces a simplified approach to masked diffusion LLMs, focusing on achieving strong performance with minimal complexity. The core idea involves a novel training strategy, SUBS, which combines training from scratch with fine-tuning to enhance performance across various genomic sequence classification tasks.

Methodology

The paper diverges from traditional diffusion models by employing a masking strategy that selectively corrupts the input sequence. This approach allows the model to focus on learning the relationships between masked and unmasked tokens, which is particularly relevant for LLMing tasks. The SUBS training regime involves two phases: initial training from scratch and subsequent fine-tuning. This hybrid approach aims to leverage the benefits of both methods, enabling the model to learn general language patterns and adapt to specific task requirements.

Experimental Results

The efficacy of the proposed method is demonstrated through a series of experiments on genomic sequence classification tasks. The results indicate that SUBS achieves competitive performance compared to existing methods like Mamba, SEDD, Caducues, Plaid, and D3PM. Notably, the fine-tuned SUBS model often outperforms other models, highlighting the benefits of the two-stage training approach. For instance, on the "Mouse Enhancers" dataset, SUBS (fine-tuned) achieves an accuracy of 0.795, outperforming Mamba (0.763) and D3PM (0.787). Similarly, on the "Human NonTATA Promoters" dataset, SUBS (fine-tuned) reaches an accuracy of 0.938, surpassing Mamba (0.926) and SEDD (0.935).

Implications and Future Directions

The presented approach offers a practical and efficient method for training masked diffusion LLMs. The simplicity of the masking strategy and the effectiveness of the SUBS training regime make it an attractive option for researchers and practitioners. Future work could explore the application of this method to other LLMing tasks and investigate the impact of different masking strategies and fine-tuning techniques. Additionally, it would be interesting to analyze the computational efficiency and scalability of the proposed method in more detail.

Conclusion

The paper introduces a refined approach to masked diffusion LLMs, achieving commendable results on genomic sequence classification tasks. The simplicity and efficacy of the proposed method make it a valuable addition to the field, offering a strong baseline for future research.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com