Emergent Mind

Particle Semi-Implicit Variational Inference

(2407.00649)
Published Jun 30, 2024 in stat.ML and cs.LG

Abstract

Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible and so, they resort to either: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a natural free energy functional via a particle approximation of an Euclidean--Wasserstein gradient flow. This approach means that, unlike prior works, PVI can directly optimize the ELBO; furthermore, it makes no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably against other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.

PVI vs. PVIZero on bimodal Gaussian mixture with different kernels, showing density $q_{\theta,r$.

Overview

  • The paper presents Particle Semi-Implicit Variational Inference (PVI), a novel approach that enhances Semi-Implicit Variational Inference (SIVI) by using empirical particle-based methods to approximate optimal mixing distributions.

  • PVI directly maximizes the Evidence Lower Bound (ELBO) without parametric assumptions, through a particle approximation of a Euclidean--Wasserstein gradient flow.

  • Theoretical contributions include the construction of gradient flows in Euclidean--Wasserstein space and rigorous guarantees for the existence and uniqueness of solutions, demonstrated superior empirical performance on various tasks.

Particle Semi-Implicit Variational Inference

Introduction

The paper "Particle Semi-Implicit Variational Inference" (PVI) by Lim and Johansen introduces a novel approach to Semi-Implicit Variational Inference (SIVI). SIVI enriches variational families through the use of kernels and mixing distributions, significantly enhancing their expressiveness. Existing SIVI methodologies, however, grapple with intractable variational densities due to the use of implicit distributions for parameterizing the mixing distributions. This inevitably results in the necessity for alternative optimization strategies such as bounds on the Evidence Lower Bound (ELBO) or minimax formulations. The proposed work overcomes these hurdles by introducing the PVI method, which approximates the optimal mixing distributions via an empirical particle-based approach and optimizes the ELBO directly without parametric assumptions.

Methodology

Semi-Implicit Variational Inference (SIVI)

At the core of Bayesian inference lies the posterior distribution ( p(x|y) ), which is often intractable. Variational Inference (VI) tackles this by approximating the posterior with a distribution ( q\theta(x) ) from a variational family indexed by parameters ( \theta ). SIVI enhances the flexibility of variational distributions by employing semi-implicit distributions (SID) of the form: [ q{k, r}(x) = \int k(x|z) r(z) \, \mathrm{d}z ] Here, ( k ) is a kernel and ( r ) is the mixing distribution. While SIDs are capable of expressing complex properties such as multimodality and skewness, existing methods face significant challenges in optimizing the ELBO directly due to the intractability of ( q_{k, r} ).

Particle Variational Inference (PVI)

PVI addresses the above challenges by utilizing empirical measures to approximate the optimal mixing distributions. This is achieved through a particle approximation of a Euclidean--Wasserstein gradient flow. Notably, unlike prior SIVI algorithms, PVI can directly maximize the ELBO: [ \mathbb{E}(k, r) := \int \log \frac{q{k,r}(x)}{ p(x, y)} q{k,r}(\mathrm{d}x) ] The PVI algorithm constructs a gradient flow in the Euclidean--Wasserstein geometry, discretized to yield practical updates for both ( \theta ) and ( r ).

Results and Analysis

Empirical Performance

The empirical results of PVI demonstrate its superior performance compared to other SIVI approaches over various tasks, including toy problems and high-dimensional Bayesian neural network regression. These tasks underscore PVI's ability to approximate complex posteriors accurately.

Theoretical Contributions

  1. Gradient Flow Construction: The authors introduce a gradient flow minimizing a regularized version of the free energy ( \mathbb{E}_\lambda ) in the Euclidean--Wasserstein space.
  2. Algorithmic Development: A practical PVI algorithm is derived via discretization, leveraging empirical measures to navigate the variational space without reliance on MCMC or minimax optimization.
  3. Existence and Uniqueness of Solutions: Theoretical analysis establishes the existence and uniqueness of the solutions to the gradient flow of ( \mathbb{E}_\lambda ), providing rigorous foundation to the proposed method.
  4. Propagation of Chaos: The paper proves propagation of chaos results for the gradient flow, ensuring the particle-based approximation's validity.

Implications and Future Directions

Practical Implications

PVI enables more efficient and accurate variational inference by eliminating the need for upper bounds on the ELBO or complex parameterizations of the mixing distributions. This makes it particularly valuable for applications requiring high-dimensional and expressive variational approximations, such as Bayesian neural networks and other hierarchical models.

Theoretical Implications

The introduction of PVI opens new avenues for further investigations into gradient flows in the Euclidean--Wasserstein space, particularly in variational inference contexts. The theoretical guarantees provided by the authors encourage exploration into more generalized settings and more complex variational families.

Future Developments

Future research can focus on extending PVI to handle even more intricate posterior distributions and more scalable implementations. Moreover, the exploration of alternative kernels and mixing distributions optimized by similar particle-based approaches might yield even more powerful inference algorithms. Further investigation into the convergence properties of PVI and its theoretical foundations when ( \gamma = 0 ) would solidify its standing and broaden its applicability.

Conclusion

Lim and Johansen’s work on Particle Semi-Implicit Variational Inference is a significant stride in the field of variational inference, offering an innovative solution to the intractability issues inherent in semi-implicit distributions. By leveraging particle approximations and gradient flows in the Euclidean--Wasserstein space, PVI stands out as a robust and theoretically grounded approach capable of addressing complex inference challenges effectively.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.