Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

Published 14 May 2020 in cs.LG and stat.ML | (2005.07186v2)

Abstract: Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants.

Abstract PDF Upgrade to Chat

Citations (197)

View on Semantic Scholar

Summary

The paper presents a rank-1 parameterization that reduces computational cost and improves the accuracy and calibration of Bayesian Neural Networks.
It leverages low-dimensional subspaces and heavy-tailed priors to enable scalable uncertainty estimation and robustness across diverse benchmarks.
Empirical results show enhanced log-likelihood, resilience to distribution shifts, and practical improvements on datasets like ImageNet and CIFAR-10-C.

Efficient and Scalable Bayesian Neural Networks with Rank-1 Factors

The paper "Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors" presents a novel approach to improving the efficiency and scalability of Bayesian Neural Networks (BNNs) through a rank-1 parameterization. This parameterization addresses notable challenges in the current use of BNNs, namely underperformance in accuracy metrics and the significant computational cost induced by traditional methods of maintaining weight distribution. Here, we explore the core ideas, empirical findings, and future implications presented in the paper.

Bayesian Neural Networks hold promise for uncertainty estimation and robustness in modern deep learning tasks. However, BNNs frequently suffer from underfitting, partially due to complexities in the configuration of weight priors and variational posteriors, as well as the variance from sampling weights during training. Moreover, the conventional BNN techniques significantly inflate the number of parameters due to the need to model distributions over all weights. Against this backdrop, the paper proposes the utilization of rank-1 parameterization for BNNs, where each weight matrix consists of a distribution over a rank-1 subspace. This methodology leverages low-rank subspaces effectively, thereby achieving state-of-the-art performance in a parameter-efficient manner.

The paper identifies two primary challenges in the current usage of BNNs: computational inefficiency and difficulty in achieving optimal parameterization. To tackle these, it applies concepts from efficient ensembling in deep learning and observes the intrinsic low dimensionality in neural network weights. By manipulating these weight distributions within a rank-1 subspace, the authors report considerable variance reduction. This allows the model to explore solutions in a constrained and computationally feasible manner, supporting better scalability to larger datasets and complex architectures without superficial inflation of parameter count.

Empirical evaluations demonstrate the efficacy of rank-1 BNNs across various datasets including ImageNet, CIFAR-10, CIFAR-100, and MIMIC-III. Across these benchmarks, rank-1 BNNs consistently outperform deterministic and ensemble baselines in terms of log-likelihood, accuracy, and calibration, both on test sets and corrupted out-of-distribution variants. Significant improvements are observed particularly in robustness against distribution shifts, exemplified by superior performance on corrupted datasets like CIFAR-10-C.

A notable highlight of the study is the use of heavy-tailed prior distributions, such as Cauchy, which contributed to improved robustness without sacrificing accuracy. The exploration of these priors transcends previous empirical attempts that were predominantly limited to compression tasks and demonstrates their potential in enhancing generalization capabilities in BNNs.

The implications of this research are manifold and suggest potential future developments in AI. Firstly, scalable BNNs with enhanced efficiency can provide more accurate uncertainty estimates in real-world applications, where robustness and reliability are critical. Secondly, there is a promising avenue for incorporating BNNs into larger models and more complex networks without the prohibitive cost typically associated with Bayesian methods. Thirdly, the methodology opens discussions on the role of heavy-tailed distributions in deep learning models, providing directions for future research on training stability and convergence in larger or recurrent architectures.

Moving forward, the authors propose further exploration into scaling rank-1 BNNs to larger models and incorporating MCMC methods to exploit the benefits of efficient Bayesian sampling. Additionally, higher rank factors and their empirical impact remain areas for potential investigation, given the promising results with rank-1 parameterization.

In summary, this paper represents a significant step forward in addressing computational inefficiencies and underperformance issues of Bayesian neural networks. Through a thoughtful combination of rank-1 parameterization and mixture variational distributions, the authors have laid out a comprehensive framework that is both theoretically sound and empirically validated, opening new vistas for Bayesian approaches in deep learning.

Markdown Report Issue