Particle Semi-Implicit Variational Inference (2407.00649v2)

Published 30 Jun 2024 in stat.ML and cs.LG

Abstract: Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible, so they resort to one of the following: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a free energy functional. PVI arises naturally as a particle approximation of a Euclidean--Wasserstein gradient flow and, unlike prior works, it directly optimizes the ELBO whilst making no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably compared to other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces PVI, which uses particle approximations to directly optimize the ELBO and overcome intractable variational densities.
The method leverages a gradient flow in Euclidean–Wasserstein space, providing practical updates that outperform traditional SIVI techniques.
Empirical results demonstrate PVI’s superior performance on complex tasks like high-dimensional Bayesian neural network regression and toy problems.

Particle Semi-Implicit Variational Inference

Introduction

The paper "Particle Semi-Implicit Variational Inference" (PVI) by Lim and Johansen introduces a novel approach to Semi-Implicit Variational Inference (SIVI). SIVI enriches variational families through the use of kernels and mixing distributions, significantly enhancing their expressiveness. Existing SIVI methodologies, however, grapple with intractable variational densities due to the use of implicit distributions for parameterizing the mixing distributions. This inevitably results in the necessity for alternative optimization strategies such as bounds on the Evidence Lower Bound (ELBO) or minimax formulations. The proposed work overcomes these hurdles by introducing the PVI method, which approximates the optimal mixing distributions via an empirical particle-based approach and optimizes the ELBO directly without parametric assumptions.

Methodology

Semi-Implicit Variational Inference (SIVI)

At the core of Bayesian inference lies the posterior distribution $p(x|y)$ , which is often intractable. Variational Inference (VI) tackles this by approximating the posterior with a distribution $q_\theta(x)$ from a variational family indexed by parameters $\theta$ . SIVI enhances the flexibility of variational distributions by employing semi-implicit distributions (SID) of the form: $q_{k, r}(x) = \int k(x|z) r(z) \, \mathrm{d}z$ Here, $k$ is a kernel and $r$ is the mixing distribution. While SIDs are capable of expressing complex properties such as multimodality and skewness, existing methods face significant challenges in optimizing the ELBO directly due to the intractability of $q_{k, r}$ .

Particle Variational Inference (PVI)

PVI addresses the above challenges by utilizing empirical measures to approximate the optimal mixing distributions. This is achieved through a particle approximation of a Euclidean--Wasserstein gradient flow. Notably, unlike prior SIVI algorithms, PVI can directly maximize the ELBO: $\mathbb{E}(k, r) := \int \log \frac{q_{k,r}(x)}{ p(x, y)} q_{k,r}(\mathrm{d}x)$ The PVI algorithm constructs a gradient flow in the Euclidean--Wasserstein geometry, discretized to yield practical updates for both $\theta$ and $r$ .

Results and Analysis

Empirical Performance

The empirical results of PVI demonstrate its superior performance compared to other SIVI approaches over various tasks, including toy problems and high-dimensional Bayesian neural network regression. These tasks underscore PVI's ability to approximate complex posteriors accurately.

Theoretical Contributions

Gradient Flow Construction: The authors introduce a gradient flow minimizing a regularized version of the free energy $\mathbb{E}_\lambda$ in the Euclidean--Wasserstein space.
Algorithmic Development: A practical PVI algorithm is derived via discretization, leveraging empirical measures to navigate the variational space without reliance on MCMC or minimax optimization.
Existence and Uniqueness of Solutions: Theoretical analysis establishes the existence and uniqueness of the solutions to the gradient flow of $\mathbb{E}_\lambda$ , providing rigorous foundation to the proposed method.
Propagation of Chaos: The paper proves propagation of chaos results for the gradient flow, ensuring the particle-based approximation's validity.

Implications and Future Directions

Practical Implications

PVI enables more efficient and accurate variational inference by eliminating the need for upper bounds on the ELBO or complex parameterizations of the mixing distributions. This makes it particularly valuable for applications requiring high-dimensional and expressive variational approximations, such as Bayesian neural networks and other hierarchical models.

Theoretical Implications

The introduction of PVI opens new avenues for further investigations into gradient flows in the Euclidean--Wasserstein space, particularly in variational inference contexts. The theoretical guarantees provided by the authors encourage exploration into more generalized settings and more complex variational families.

Future Developments

Future research can focus on extending PVI to handle even more intricate posterior distributions and more scalable implementations. Moreover, the exploration of alternative kernels and mixing distributions optimized by similar particle-based approaches might yield even more powerful inference algorithms. Further investigation into the convergence properties of PVI and its theoretical foundations when $\gamma = 0$ would solidify its standing and broaden its applicability.

Conclusion

Lim and Johansen’s work on Particle Semi-Implicit Variational Inference is a significant stride in the field of variational inference, offering an innovative solution to the intractability issues inherent in semi-implicit distributions. By leveraging particle approximations and gradient flows in the Euclidean--Wasserstein space, PVI stands out as a robust and theoretically grounded approach capable of addressing complex inference challenges effectively.

PDF Markdown

Related Papers

Semi-Implicit Variational Inference via Score Matching (2023)
Efficient Semi-Implicit Variational Inference (2021)
Doubly Semi-Implicit Variational Inference (2018)
Semi-Implicit Variational Inference (2018)
Kernel Semi-Implicit Variational Inference (2024)

Tweets

https://twitter.com/sp_monte_carlo/status/1808074423645143128