Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tighter Information-Theoretic Generalization Bounds from Supersamples (2302.02432v3)

Published 5 Feb 2023 in stat.ML, cs.IT, cs.LG, and math.IT

Abstract: In this work, we present a variety of novel information-theoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)-the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Chaining mutual information and tightening generalization bounds. Advances in Neural Information Processing Systems, 31, 2018.
  2. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
  3. Tightening mutual information based bounds on generalization error. In 2019 IEEE International Symposium on Information Theory (ISIT), pp.  587–591. IEEE, 2019.
  4. Catoni, O. Pac-bayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Lecture Notes - Monograph Series. Institute of Mathematical Statistics, 2007.
  5. Chained generalisation bounds. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp. 4212–4257. PMLR, 02–05 Jul 2022.
  6. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA, 2006. ISBN 0471241954.
  7. Cédric, V. Optimal Transport: Old and New (Grundlehren der mathematischen Wissenschaften, 338). Springer, 2008.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255, 2009.
  9. Sliced mutual information: A scalable measure of statistical dependence. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021.
  10. $k$-sliced mutual information: A quantitative study of scalability with dimension. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.
  11. Pac-bayes, mac-bayes and conditional mutual information: Fast rate bounds that handle general vc classes. In Conference on Learning Theory. PMLR, 2021.
  12. Conditioning and processing: Techniques to improve information-theoretic generalization bounds. Advances in Neural Information Processing Systems, 33:16457–16467, 2020.
  13. Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms. Advances in Neural Information Processing Systems, 2020.
  14. Towards a unified information-theoretic framework for generalization. Advances in Neural Information Processing Systems, 34:26370–26381, 2021.
  15. Understanding generalization via leave-one-out conditional mutual information. In 2022 IEEE International Symposium on Information Theory (ISIT), pp.  2487–2492. IEEE, 2022.
  16. Limitations of information-theoretic generalization bounds for gradient descent methods in stochastic convex optimization. In International Conference on Algorithmic Learning Theory, pp.  663–706. PMLR, 2023.
  17. Information-theoretic generalization bounds for black-box learning algorithms. In Advances in Neural Information Processing Systems, 2021.
  18. Information-theoretic characterization of the generalization error for iterative semi-supervised learning. Journal of Machine Learning Research, 23:1–52, 2022.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  20. Generalization bounds via information density and conditional information density. IEEE Journal on Selected Areas in Information Theory, 1(3):824–839, 2020.
  21. Fast-rate loss bounds via conditional information measures with applications to neural networks. In 2021 IEEE International Symposium on Information Theory (ISIT), pp.  952–957. IEEE, 2021.
  22. A new family of generalization bounds using samplewise evaluated CMI. In Advances in Neural Information Processing Systems, 2022a.
  23. Evaluated CMI bounds for meta learning: Tightness and expressiveness. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022b.
  24. Hoeffding, W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
  25. Three factors influencing minima in sgd. arXiv preprint arXiv:1711.04623, 2017.
  26. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Advances in neural information processing systems, 21, 2008.
  27. Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  28. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  29. Information-theoretic generalization bounds for sgld via data-dependent estimates. Advances in Neural Information Processing Systems, 2019.
  30. Information-theoretic generalization bounds for stochastic gradient descent. In Conference on Learning Theory. PMLR, 2021.
  31. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  32. Generalization error bounds for noisy, iterative algorithms. In 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018.
  33. Lecture notes on information theory. Lecture Notes for 6.441 (MIT), ECE 563 (UIUC), STAT 364 (Yale), 2019., 2019.
  34. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pp.  1674–1703. PMLR, 2017.
  35. On leave-one-out conditional mutual information for generalization. In Advances in Neural Information Processing Systems, 2022.
  36. On random subset generalization error bounds and the stochastic gradient langevin dynamics algorithm. In 2020 IEEE Information Theory Workshop (ITW), pp.  1–5. IEEE, 2021.
  37. Tighter expected generalization error bounds via wasserstein distance. Advances in Neural Information Processing Systems, 34, 2021.
  38. Controlling bias in adaptive data analysis using information theory. In Artificial Intelligence and Statistics. PMLR, 2016.
  39. How much does your data exploration overfit? controlling bias via information usage. IEEE Transactions on Information Theory, 66(1):302–323, 2019.
  40. Pac-bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58(12):7086–7093, 2012.
  41. Shannon, C. E. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948. doi: 10.1002/j.1538-7305.1948.tb01338.x.
  42. Reasoning about generalization via conditional mutual information. In Conference on Learning Theory. PMLR, 2020.
  43. Pac-bayes-empirical-bernstein inequality. Advances in Neural Information Processing Systems, 26, 2013.
  44. Vapnik, V. Statistical learning theory. Wiley, 1998. ISBN 978-0-471-03003-4.
  45. Analyzing the generalization capability of sgld using properties of gaussian channels. Advances in Neural Information Processing Systems, 34:24222–24234, 2021.
  46. On the generalization of models trained with SGD: Information-theoretic bounds and implications. In International Conference on Learning Representations, 2022a.
  47. Two facets of sde under an information-theoretic lens: Generalization of sgd via training trajectories and via terminal states. arXiv preprint arXiv:2211.10691, 2022b.
  48. Information-theoretic analysis of unsupervised domain adaptation. In International Conference on Learning Representations, 2023.
  49. Information-theoretic analysis for transfer learning. In 2020 IEEE International Symposium on Information Theory (ISIT), pp.  2819–2824. IEEE, 2020.
  50. Information-theoretic analysis of generalization capability of learning algorithms. Advances in Neural Information Processing Systems, 2017.
  51. Fast-rate pac-bayes generalization bounds via shifted rademacher processes. Advances in Neural Information Processing Systems, 32, 2019.
  52. Yeung, R. W. A new outlook on shannon’s information measures. IEEE transactions on information theory, 37(3):466–474, 1991.
  53. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017.
  54. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  55. Localization of vc classes: Beyond local rademacher complexities. Theoretical Computer Science, 742:27–49, 2018.
  56. Individually conditional individual mutual information bound on generalization error. IEEE Transactions on Information Theory, 68(5):3304–3316, 2022a.
  57. Stochastic chaining and strengthened information-theoretic generalization bounds. In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ziqiao Wang (40 papers)
  2. Yongyi Mao (45 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.