Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Likelihood Based Inference in Fully and Partially Observed Exponential Family Graphical Models with Intractable Normalizing Constants (2404.17763v1)

Published 27 Apr 2024 in stat.ME, stat.CO, and stat.ML

Abstract: Probabilistic graphical models that encode an underlying Markov random field are fundamental building blocks of generative modeling to learn latent representations in modern multivariate data sets with complex dependency structures. Among these, the exponential family graphical models are especially popular, given their fairly well-understood statistical properties and computational scalability to high-dimensional data based on pseudo-likelihood methods. These models have been successfully applied in many fields, such as the Ising model in statistical physics and count graphical models in genomics. Another strand of models allows some nodes to be latent, so as to allow the marginal distribution of the observable nodes to depart from exponential family to capture more complex dependence. These approaches form the basis of generative models in artificial intelligence, such as the Boltzmann machines and their restricted versions. A fundamental barrier to likelihood-based (i.e., both maximum likelihood and fully Bayesian) inference in both fully and partially observed cases is the intractability of the likelihood. The usual workaround is via adopting pseudo-likelihood based approaches, following the pioneering work of Besag (1974). The goal of this paper is to demonstrate that full likelihood based analysis of these models is feasible in a computationally efficient manner. The chief innovation lies in using a technique of Geyer (1991) to estimate the intractable normalizing constant, as well as its gradient, for intractable graphical models. Extensive numerical results, supporting theory and comparisons with pseudo-likelihood based approaches demonstrate the applicability of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169.
  2. C. C. Aggarwal (2018). Restricted Boltzmann Machines, pages 235–270. Springer International Publishing, Cham.
  3. J. Ashford and R. Sowden (1970). Multi-variate probit analysis. Biometrics pages 535–546.
  4. O. Barndorff-Nielsen (1978). Information and exponential families in statistical theory. John Wiley & Sons.
  5. J. Besag (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological) 36, 192–225.
  6. M. Betancourt (2011). Nested sampling with constrained Hamiltonian Monte Carlo. In AIP Conference Proceedings, volume 1305, pages 165–172. American Institute of Physics.
  7. The Horseshoe+ Estimator of Ultra-Sparse Signals. Bayesian Analysis 12, 1105 – 1131.
  8. Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika 103, 985–991.
  9. Sub-optimality of some continuous shrinkage priors. Stochastic Processes and their Applications 126, 3828 – 3842. In Memoriam: Evarist Giné.
  10. Dirichlet–Laplace priors for optimal shrinkage. Journal of the American Statistical Association 110, 1479–1490.
  11. Concentration inequalities. In Summer school on machine learning, pages 208–240. Springer.
  12. C. G. Broyden (1970). The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics 6, 76–90.
  13. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.
  14. Bayesian sparse multiple regression for simultaneous rank reduction and variable selection. Biometrika 107, 205–221.
  15. Bayesian inference on high-dimensional multivariate binary responses. Journal of the American Statistical Association 0, 1–12.
  16. The matthews correlation coefficient (mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining 14, 1–22.
  17. L. Deng (2012). The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 141–142.
  18. A. Fischer and C. Igel (2012). An introduction to restricted Boltzmann machines. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 17th Iberoamerican Congress, CIARP 2012, Buenos Aires, Argentina, September 3-6, 2012. Proceedings 17, pages 14–36. Springer.
  19. R. A. Fisher (1922). On the mathematical foundations of theoretical statistics. Philosophical transactions of the Royal Society of London, Series A 222, 309–368.
  20. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.
  21. E. I. George and R. E. McCulloch (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association 88, 881–889.
  22. C. J. Geyer (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proc. 23rd Symposium of the Interface Foundation, pages 156–163.
  23. C. J. Geyer and E. A. Thompson (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society: Series B 54, 657–683.
  24. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press.
  25. D. O. Hebb (1949). The organization of behavior: A neuropsychological theory. Wiley.
  26. G. E. Hinton (2002). Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800.
  27. G. E. Hinton (2007). Boltzmann machine. Scholarpedia 2, 1668.
  28. A hierarchical community of experts. In Learning in graphical models, pages 479–494. Springer.
  29. J. J. Hopfield (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79, 2554–2558.
  30. E. Ising (1924). Beitrag zur theorie des ferro-und paramagnetismus. PhD thesis, Grefe & Tiedemann Hamburg.
  31. High whsc1l1 expression reduces survival rates in operated breast cancer patients with decreased cd8+ t cells: machine learning approach. Journal of Personalized Medicine 11, 636.
  32. S. L. Lauritzen (1996). Graphical models, volume 17. Clarendon Press.
  33. N. Le Roux and Y. Bengio (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural computation 20, 1631–1649.
  34. F. Liang (2010). A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants. Journal of Statistical Computation and Simulation 80, 1007–1022.
  35. N. Meinshausen and P. Bühlmann (2006). High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics 34, 1436 – 1462.
  36. N. Meinshausen and P. Bühlmann (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 417–473.
  37. T. J. Mitchell and J. J. Beauchamp (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association 83, 1023–1032.
  38. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93, 451–458.
  39. G. Montufar and N. Ay (2011). Refinements of universal approximation results for deep belief networks and restricted Boltzmann machines. Neural computation 23, 1306–1319.
  40. MCMC for doubly-intractable distributions. arXiv preprint arXiv:1206.6848 .
  41. R. M. Neal (2001). Annealed importance sampling. Statistics and Computing 11, 125–139.
  42. R. M. Neal et al. (2011). MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte Carlo 2, 2.
  43. W. K. Newey (1991). Uniform convergence in probability and stochastic equicontinuity. Econometrica: Journal of the Econometric Society pages 1161–1167.
  44. N. Parikh and S. Boyd (2014). Proximal algorithms. Foundations and trends® in Optimization 1, 127–239.
  45. T. Park and G. Casella (2008). The Bayesian lasso. Journal of the American Statistical Association 103, 681–686.
  46. I. Pinelis (2020). Exact lower and upper bounds on the incomplete gamma function. arXiv preprint arXiv:2005.06384 .
  47. R. B. Potts (1952). Some generalized order-disorder transformations. In Mathematical proceedings of the cambridge philosophical society, volume 48, pages 106–109. Cambridge University Press.
  48. Aberrations of chromosomes 1 and 16 in breast cancer: a framework for cooperation of transcriptionally dysregulated genes. Cancers 13, 1585.
  49. C. R. Rao (1945). Information and accuracy attainable in the estimation of statistical parameters. kotz s & johnson nl (eds.), breakthroughs in statistics volume i: Foundations and basic theory, 235–248.
  50. High-dimensional Ising model selection using ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized logistic regression. The Annals of Statistics 38, 1287–1319.
  51. L. Roach and X. Gao (2023). Graphical local genetic algorithm for high-dimensional log-linear models. Mathematics 11, 2514.
  52. Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics 19, 947–962.
  53. R. Salakhutdinov and G. Hinton (2009). Deep Boltzmann machines. In Artificial intelligence and statistics, pages 448–455. PMLR.
  54. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning, pages 791–798.
  55. R. Salakhutdinov and I. Murray (2008). On the quantitative analysis of deep belief networks. In Proceedings of the 25th International Conference on Machine Learning, pages 872–879.
  56. P. Smolensky (1986). Information processing in dynamical systems: Foundations of harmony theory. In Parallel distributed processing: Explorations in the microstructure of cognition, pages 194–281–. MIT Press, Cambridge, MA.
  57. Noisy Hamiltonian Monte Carlo for doubly intractable distributions. Journal of Computational and Graphical Statistics 28, 220–232.
  58. I. Sutskever and T. Tieleman (2010). On the convergence properties of contrastive divergence. In Proc. AISTATS, pages 789–795.
  59. TCGA (2012). Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70.
  60. R. Tibshirani (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 267–288.
  61. R. S. Varga (2010). Gervsgorin and his circles, volume 36. Springer Science & Business Media.
  62. M. J. Wainwright and M. I. Jordan (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning 1, 1–305.
  63. C. F. J. Wu (1983). On the Convergence Properties of the EM Algorithm. The Annals of Statistics 11, 95 – 103.
  64. Tmprss2 serves as a prognostic biomarker and correlated with immune infiltrates in breast invasive cancer and lung adenocarcinoma. Frontiers in Molecular Biosciences 9, 647826.
  65. Graphical models via generalized linear models. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.
  66. Graphical models via univariate exponential family distributions. Journal of Machine Learning Research 16, 3813–3847.
  67. On Poisson graphical models. Advances in neural information processing systems 26,.
  68. Y.-L. Yu (2013). On decomposing the proximal map. Advances in neural information processing systems 26,.
  69. Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 329–346.
  70. Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1–38.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yujie Chen (46 papers)
  2. Anindya Bhadra (27 papers)
  3. Antik Chakraborty (7 papers)

Summary

We haven't generated a summary for this paper yet.