Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tighter Generalisation Bounds via Interpolation (2402.05101v1)

Published 7 Feb 2024 in stat.ML and cs.LG

Abstract: This paper contains a recipe for deriving new PAC-Bayes generalisation bounds based on the $(f, \Gamma)$-divergence, and, in addition, presents PAC-Bayes generalisation bounds where we interpolate between a series of probability divergences (including but not limited to KL, Wasserstein, and total variation), making the best out of many worlds depending on the posterior distributions properties. We explore the tightness of these bounds and connect them to earlier results from statistical learning, which are specific cases. We also instantiate our bounds as training objectives, yielding non-trivial guarantees and practical performances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Alquier, P. User-friendly Introduction to PAC-Bayes Bounds. Foundations and Trends® in Machine Learning, 2024.
  2. Simpler PAC-Bayesian bounds for hostile data. Machine Learning, 2018.
  3. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research, 2016.
  4. Tighter PAC-Bayes Bounds. In Advances in Neural Information Processing Systems (NIPS), 2006.
  5. Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory. In International Conference on Machine Learning (ICML), 2018.
  6. Integral Probability Metrics PAC-Bayes Bounds. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  7. Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  8. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. In Conference on Computational Learning Theory (COLT), 2001.
  9. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 2002.
  10. PAC-Bayesian Bounds based on the Rényi Divergence. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
  11. (f,Γ)𝑓Γ(f,\Gamma)( italic_f , roman_Γ )-Divergences: Interpolating between f-Divergences and Integral Probability Metrics. Journal of Machine Learning Research, 23, 2022.
  12. Concentration Inequalities - A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
  13. Minimization of divergences on sets of signed measures. Studia Scientiarum Mathematicarum Hungarica, 2006.
  14. Catoni, O. PAC-Bayesian supervised classification: the thermodynamics of statistical learning. Institute of Mathematical Statistics, 2007.
  15. A unified recipe for deriving (time-uniform) PAC-Bayes bounds. arXiv, abs/2302.03421, 2023.
  16. Elements of Information Theory. Wiley, 2001.
  17. Csiszár, I. I𝐼Iitalic_I-Divergence Geometry of Probability Distributions and Minimization Problems. The Annals of Probability, 1975.
  18. Wasserstein-1111 distance between SDEs driven by Brownian motion and stable processes, 2023.
  19. Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  20. Asymptotic evaluation of certain Markov process expectations for large time—III. Communications on Pure and Applied Mathematics, 1976.
  21. UCI Machine Learning Repository, 2017.
  22. From Mutual Information to Expected Dynamics: New Generalization Bounds for Heavy-Tailed SGD. arXiv, abs/2312.00427, 2023.
  23. Generalization bounds using data-dependent fractal dimensions. In International Conference on Machine Learning (ICML), 2023.
  24. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. In Conference on Uncertainty in Artificial Intelligence (UAI), 2017.
  25. On the role of data in PAC-Bayes bounds. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
  26. PAC-Bayesian Model Selection for Reinforcement Learning. In Advances in Neural Information Processing Systems (NIPS), 2010.
  27. Generalization Bounds for Meta-Learning via PAC-Bayes and Uniform Stability. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  28. PAC-Bayesian learning of linear classifiers. In International Conference on Machine Learning (ICML), 2009.
  29. On Choosing and Bounding Probability Metrics. International Statistical Review, 2002.
  30. Guedj, B. A Primer on PAC-Bayesian Learning. In Proceedings of the second congress of the French Mathematical Society, 2019.
  31. The heavy-tail phenomenon in SGD. In International Conference on Machine Learning (ICML), 2021.
  32. Online PAC-Bayes Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  33. PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales. Transactions on Machine Learning Research, 2023a.
  34. Wasserstein PAC-Bayes Learning: A Bridge Between Generalisation and Optimisation. arXiv, abs/2304.07048, 2023b.
  35. PAC-Bayes Unleashed: Generalisation Bounds with Unbounded Losses. Entropy, 23, 2021.
  36. Generalization bounds: Perspectives from information theory and PAC-Bayes. arXiv preprint arXiv:2309.04381, 2023.
  37. Multiplicative noise and heavy tails in stochastic optimization. In International Conference on Machine Learning (ICML), 2021.
  38. Generalization bounds using lower tail exponents in stochastic optimizers. In International Conference on Machine Learning (ICML), 2022.
  39. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization. In Advances in Neural Information Processing Systems (NIPS), 2008.
  40. Efron-Stein PAC-Bayesian Inequalities. arXiv, abs/1909.01931, 2019.
  41. LeCun, Y. The MNIST database of handwritten digits, 1998.
  42. Chaotic regularization and heavy-tailed limits for deterministic gradient descent. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  43. Lindvall, T. Lectures on the coupling method. Wiley series in probability and mathematical statistics. Wiley, 1992.
  44. Maurer, A. A note on the PAC-Bayesian theorem. arXiv, cs/0411099, 2004.
  45. McAllester, D. Some PAC-Bayesian Theorems. Machine Learning, 1999.
  46. McAllester, D. Pac-bayesian stochastic model selection. Machine Learning, 2003.
  47. McAllester, D. A PAC-Bayesian Tutorial with A Dropout Bound. arXiv, abs/1307.21181, 2013.
  48. Foundations of Machine Learning. Adaptive computation and machine learning. MIT Press, 2012.
  49. Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Conference on Learning Theory, 2018.
  50. Neal, R. M. Bayesian learning for neural networks. Springer Science & Business Media, 2012.
  51. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization. IEEE Transactions on Information Theory, 2010.
  52. Novel Change of Measure Inequalities with Applications to PAC-Bayesian Bounds and Monte Carlo Estimation. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
  53. Training Deep Networks without Learning Rates Through Coin Betting. In Advances in Neural Information Processing Systems (NIPS), 2017.
  54. PAC-bayes bounds with data dependent priors. Journal of Machine Learning Research, 2012.
  55. Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  56. Learning PAC-Bayes priors for probabilistic neural networks. 2021a.
  57. Progress in Self-Certified Neural Networks. In NeurIPS 2021 Workshop on Bayesian Deep Learning, 2021b.
  58. Tighter Risk Certificates for Neural Networks. Journal of Machine Learning Research, 22, 2021c.
  59. On change of measure inequalities for f𝑓fitalic_f-divergences. arXiv, abs/2202.05568, 2022.
  60. Algorithmic stability of heavy-tailed stochastic gradient descent on least squares. In International Conference on Algorithmic Learning Theory (ALT), 2023a.
  61. Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares. In International Conference on Algorithmic Learning Theory (ALT), 2023b.
  62. PAC-Bayes analysis beyond the usual bounds. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  63. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 1996.
  64. PACOH: Bayes-optimal meta-learning with PAC-guarantees. In International Conference on Machine Learning (ICML), 2021.
  65. PAC-Bayesian Meta-Learning: From Theory to Practice. arXiv, abs/2211.07206, 2022.
  66. PAC-Bayesian Offline Contextual Bandits With Guarantees. In International Conference on Machine Learning (ICML), 2023.
  67. PAC-Bayesian Analysis of Martingales and Multiarmed Bandits. arXiv, abs/1105.2416, 2011.
  68. PAC-Bayesian Inequalities for Martingales. IEEE Transactions on Information Theory, 2012.
  69. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks. In International Conference on Machine Learning (ICML), 2019.
  70. Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  71. PAC-Bayes-Empirical-Bernstein Inequality. In Advances in Neural Information Processing Systems (NeurIPS), 2013.
  72. Theory of pattern recognition, 1974.
  73. Vapnik, V. N. The Nature of Statistical Learning Theory, Second Edition. Statistics for Engineering and Information Science. Springer, 2000.
  74. A PAC-Bayes Analysis of Adversarial Robustness. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  75. A general framework for the practical disintegration of PAC-Bayesian bounds. Machine Learning, 2023a.
  76. Learning via Wasserstein-Based High Probability Generalisation Bounds. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  77. Wang, J. Lpsuperscript𝐿𝑝L^{p}italic_L start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-Wasserstein distance for stochastic differential equations driven by Lévy processes. Bernoulli, 2016.
  78. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, 2017.
  79. Ergodicity of stochastic differential equations with jumps and singular coefficients. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 2020.
  80. Fast-rate PAC-Bayes Generalization Bounds via Shifted Rademacher Processes. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.