Learning Robust Statistics for Simulation-based Inference under Model Misspecification (2305.15871v3)
Abstract: Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
- Learning Summary Statistics for Bayesian Inference with Autoencoders. SciPost Phys. Core, 5:043.
- Universal robust regression via maximum mean discrepancy. arXiv:2006.00840.
- Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution, and Systematics, 41(1):379–406.
- Beaumont, M. A. (2019). Approximate Bayesian computation. Annual Review of Statistics and Its Application, 6(1):379–403.
- Approximate Bayesian computation in population genetics. Genetics, 162(4):2025–2035.
- Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer Science+Business Media, New York.
- Bernardo, S. (2000). Bayesian Theory. John Wiley & Sons.
- A general method for calibrating stochastic radio channel models with kernels. IEEE Transactions on Antennas and Propagation, 70(6):3986–4001.
- Approximate Bayesian computation with domain expert in the loop. In International Conference on Machine Learning, volume 162, pages 1893–1905.
- Optimally-weighted estimators of the maximum mean discrepancy for likelihood-free inference. arXiv:2301.11674.
- Random forest adjustment for approximate Bayesian computation. Journal of Computational and Graphical Statistics, 0(0):1–10.
- A general framework for updating belief distributions. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 78(5):1103–1130.
- Non-linear regression models for approximate Bayesian computation. Statistics and Computing, 20(1):63–73.
- A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science, 28(2).
- Statistical inference for generative models with maximum mean discrepancy. arXiv:1906.05944.
- Investigating the impact of model misspecification in neural simulation-based inference. arXiv preprint arXiv:2209.01845.
- A likelihood-free inference framework for population genetic data using exchangeable neural networks. In Advances in Neural Information Processing Systems, volume 31.
- Neural approximate sufficient statistics for implicit models. In International Conference on Learning Representations.
- MMD-Bayes: Robust Bayesian estimation via maximum mean discrepancy. In Proceesings of the 2nd Symposium on Advances in Approximate Bayesian Inference, pages 1–21.
- Finite sample properties of parametric MMD estimation: robustness to misspecification and dependence. arXiv:1912.05737.
- The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062.
- Group equivariant neural posterior estimation. In International Conference on Learning Representations.
- Towards reliable simulation-based inference with balanced neural ratio estimation. arXiv preprint arXiv:2208.13624.
- Robust bayesian inference for simulator-based models via the mmd posterior bootstrap. In International Conference on Artificial Intelligence and Statistics, volume 151, pages 943–970.
- Monte carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society: Series B (Methodological), 46(2):193–212.
- Dynamic likelihood-free inference via ratio estimation (DIRE). arXiv preprint arXiv:1810.09899.
- Simulation-based inference of single-molecule force spectroscopy. Machine Learning: Science and Technology, 4(2):025009.
- On contrastive learning for likelihood-free inference. In International Conference on Machine Learning, volume 119, pages 2771–2781.
- Black-box bayesian inference for economic agent-based models. arXiv preprint arXiv:2202.00625.
- Amortised likelihood-free inference for expensive time-series simulators with signatured ratio estimation. In International Conference on Artificial Intelligence and Statistics, volume 151, pages 11131–11144.
- Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate bayesian computation. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 74(3):419–474.
- Robust approximate bayesian inference with synthetic likelihood. Journal of Computational and Graphical Statistics, 30(4):958–976.
- Robust approximate bayesian computation: An adjustment approach. arXiv preprint arXiv:2008.04099.
- Synthetic likelihood in misspecified models: Consequences and corrections. arXiv preprint arXiv:2104.03436.
- Model misspecification in approximate bayesian computation: consequences and diagnostics. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):421–444.
- γ𝛾\gammaitalic_γ-abc: Outlier-robust approximate bayesian computation based on a robust divergence estimator. In International Conference on Artificial Intelligence and Statistics, volume 130, pages 1783–1791.
- Normalizing flows for likelihood-free inference with fusion simulations. Plasma Physics and Controlled Fusion, 64(10):104003.
- Score modeling for simulation-based inference. arXiv preprint arXiv:2209.14249.
- Variational methods for simulation-based inference. In International Conference on Learning Representations.
- Complete parameter inference for GW150914 using deep learning. Machine Learning: Science and Technology, 2(3):03LT01.
- Automatic posterior transformation for likelihood-free inference. In International Conference on Machine Learning, volume 97, pages 2404–2414.
- A kernel two-sample test. Journal of Machine Learning Research, 13:723–773.
- Grünwald, P. (2012). The Safe Bayesian. In Lecture Notes in Computer Science, pages 169–183. Springer Berlin Heidelberg.
- Modeling the polarimetric mm-wave propagation channel using censored measurements. In 2016 Global Communications Conference. IEEE.
- A statistical spatio-temporal radio channel model for large indoor environments at 60 and 70 ghz. IEEE Transactions on Antennas and Propagation, 63(6):2694–2704.
- Likelihood-free MCMC with amortized approximate ratio estimators. In International Conference on Machine Learning, volume 119, pages 4239–4248.
- Learning summary statistic for approximate Bayesian computation via deep neural network. Statistica Sinica, page 1595–1618.
- Misspecification-robust sequential neural likelihood. arXiv preprint arXiv:2301.13368.
- Generalized variational inference: Three arguments for deriving new posteriors. arXiv:1904.02063.
- Concentration and robustness of discrepancy-based ABC via Rademacher complexity. arXiv:2206.06991.
- Fundamentals and recent developments in approximate Bayesian computation. Systematic Biology, 66:66–82.
- Likelihood-free inference with emulator networks. In Proceedings of The 1st Symposium on Advances in Approximate Bayesian Inference, volume 96, pages 32–53.
- Flexible statistical inference for mechanistic models of neural dynamics. In Advances in Neural Information Processing Systems, volume 30.
- DR-ABC: Approximate Bayesian computation with kernel-based distribution regression. In Proceedings of the International Conference on Machine Learning, pages 1482–1491.
- Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141.
- Discrepancy-based inference for intractable generative models using quasi-Monte Carlo. arXiv:2106.11561.
- Generalized bayesian likelihood-free inference using scoring rules estimators. arXiv preprint arXiv:2104.03889.
- Likelihood-free inference with generative neural networks via scoring rule minimization. arXiv preprint arXiv:2205.15784.
- Fast ϵitalic-ϵ\epsilonitalic_ϵ-free inference of simulation models with bayesian conditional density estimation. In International Conference on Neural Information Processing Systems, page 1036–1044.
- Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30.
- Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows. In International Conference on Artificial Intelligence and Statistics, volume 89, pages 837–848.
- K2-ABC: approximate Bayesian computation with kernel embeddings. International Conference on Artificial Intelligence and Statistics, 51:398–407.
- Pedersen, T. (2019). Stochastic multipath model for the in-room radio channel based on room electromagnetics. IEEE Transactions on Antennas and Propagation, 67(4):2591–2603.
- ABC of the future. International Statistical Review.
- Peter J. Huber, E. M. R. (2009). Robust Statistics. WILEY.
- Bayesian neural networks with maximum mean discrepancy regularization. Neurocomputing, 453:428–437.
- Bayesian synthetic likelihood. Journal of Computational and Graphical Statistics, 27(1):1–11.
- Reliable ABC model choice via random forests. Bioinformatics, 32(6):859–866.
- Bayesflow: Learning complex stochastic models with invertible neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(4):1452–1466.
- GATSBI: Generative adversarial training for simulation-based inference. In International Conference on Learning Representations.
- Deep generative models of genetic variation capture the effects of mutations. Nature Methods, 15(10):816–822.
- Conditional density estimation with neural networks: Best practices and benchmarks. arXiv preprint arXiv:1903.00954.
- Detecting model misspecification in amortized bayesian inference with neural networks. arXiv e-prints.
- Generalized posteriors in approximate bayesian computation. arXiv preprint arXiv:2011.08644.
- Sequential neural score estimation: Likelihood-free inference with conditional score based diffusion models. arXiv preprint arXiv:2210.04872.
- Sisson, S. A. (2018). Handbook of Approximate Bayesian Computation. Chapman and Hall/CRC.
- Likelihood-free inference by ratio estimation. Bayesian Analysis, 17(1).
- A statistical model of urban multipath propagation. IEEE Transactions on Vehicular Technology, 21(1):1–9.
- Unbiased and efficient log-likelihood estimation with inverse binomial sampling. PLoS Computational Biology, 16(12):e1008483.
- Neural posterior estimation for exoplanetary atmospheric retrieval. Astronomy & Astrophysics, 672:A147.
- Robust neural posterior estimation and statistical model criticism. In Advances in Neural Information Processing Systems.
- Bayesian data selection. Journal of Machine Learning Research, 24(23):1–72.
- Sequential neural posterior and likelihood approximation. arXiv preprint arXiv:2102.06522.
- Partially exchangeable networks and architectures for learning summary statistics in approximate Bayesian computation. In International Conference on Machine Learning, volume 97, pages 6798–6807.
- Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104.
- Deep sets. Advances in neural information processing systems, 30.
- Approximate bayesian estimation of parameters of an agent-based model in epidemiology. In Lecture Notes in Networks and Systems, pages 302–314. Springer International Publishing.