Discrete Flow Matching (2407.15595v2)

Published 22 Jul 2024 in cs.LG and cs.AI

Abstract: Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers several key contributions:(i) it works with a general family of probability paths interpolating between source and target distributions; (ii) it allows for a generic formula for sampling from these probability paths using learned posteriors such as the probability denoiser ($x$-prediction) and noise-prediction ($\epsilon$-prediction); (iii) practically, focusing on specific probability paths defined with different schedulers improves generative perplexity compared to previous discrete diffusion and flow models; and (iv) by scaling Discrete Flow Matching models up to 1.7B parameters, we reach 6.7% Pass@1 and 13.4% Pass@10 on HumanEval and 6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks. Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion, significantly closing the gap between autoregressive models and discrete flow models.

Citations (19)

View on Semantic Scholar

Summary

The paper introduces a flexible discrete flow paradigm with probability paths that enhance the generation of discrete data.
It develops a unified sampling algorithm incorporating x-prediction and ϵ-prediction routines for efficient non-autoregressive generation.
Empirical results show significant performance improvements, with lower generative perplexity and superior benchmark scores over autoregressive models.

Discrete Flow Matching: A Formal Overview

Discrete generative models are imperative for handling high-dimensional discrete data, such as language. However, the application of flow matching and diffusion models in this context has not yet reached the effectiveness seen with continuous variables like images and videos. The paper "Discrete Flow Matching" introduces a novel discrete flow paradigm specifically designed to enhance the generation quality of discrete data. This summary explores the methodological advancements and empirical results presented in the paper.

Methodological Contributions

1. Flexible Probability Paths:

Discrete Flow Matching (DFM) adapts a broad family of probability paths to transition from a source distribution to a target distribution. This generalization enables the modeling of transitions by incorporating arbitrary couplings between source and target distributions, facilitating the design of more flexible generative processes.

2. Sampling Mechanisms:

The authors provide a unified sampling formula applicable to various learned posteriors. Notably, the algorithm includes mechanisms for both x-prediction and ϵ-prediction routines, further enhancing the versatility. The paper corroborates that tuning path schedulers and corrector methods substantially improves generative performance.

3. Non-Autoregressive Generation:

One of the groundbreaking facets of the proposed DFM is its ability to generate high-quality discrete data in a non-autoregressive manner. This approach has significant implications for efficiency and scalability in model inference.

Empirical Evaluation

Performance Metrics:

The paper showcases the model's capability through notable improvements in generative perplexity over previous discrete diffusion and flow models. Utilizing a model with 1.7 billion parameters, DFM achieved:

6.7% Pass@1 and 13.4% Pass@10 on HumanEval benchmarks
6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks

Additionally, DFM outperformed a comparable 1.7 billion parameter autoregressive model in generative perplexity, achieving 9.7 compared to 22.3, and closely approached the Llama-2 7B model's score of 8.3.

Theoretical Analysis

The paper's theoretical contributions lay the groundwork for the proposed method's efficacy. It extends the Continuous-Time Markov Chain (CTMC) framework from prior work and introduces a practical algorithmic methodology to ensure the correct implementation of discrete flow models. The Continuity Equation in the discrete setting was shown to hold analogously to the continuous case, ensuring mathematically sound transitions between probability states.

Conditional and Marginal Velocities:

The research elucidates generating probability velocities for both conditional and marginal scenarios. These derivations provide a solid theoretical backing for the practical applicability of DFM, ensuring robustness in generated outputs.

Practical Implications and Future Directions

The advancements in this paper mark significant progress towards bridging the performance gap between autoregressive models and discrete flow models, quickening inference times without compromising on quality. The broad applicability to discrete data types like language and code opens new avenues for richer and more scalable AI applications.

Future Work:

The potential of non-autoregressive models warrants further exploration, particularly in:

Enhancing sampling efficiency to rival continuous counterparts.
Delving deeper into the diverse design space provided by the generalized probability paths.

In conclusion, Discrete Flow Matching offers a significant leap forward in modeling high-dimensional discrete data, with substantial improvements in generative quality and efficiency. The theoretical underpinnings and empirical results position DFM as a cornerstone methodology for future research and application in discrete data generative models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/FelixKreuk/status/1815820723492933996

https://twitter.com/itai_gat/status/1815724236813590864

https://twitter.com/HannesStaerk/status/1822074279531606389

https://twitter.com/bronzeagepapi/status/1928961203969130987

https://twitter.com/_lebellig/status/1828094805886558595

https://twitter.com/aliastasis/status/1926228773311049910

YouTube

Show All Videos