Tighter Information-Theoretic Generalization Bounds from Supersamples (2302.02432v3)
Abstract: In this work, we present a variety of novel information-theoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)-the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.
- Chaining mutual information and tightening generalization bounds. Advances in Neural Information Processing Systems, 31, 2018.
- Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- Tightening mutual information based bounds on generalization error. In 2019 IEEE International Symposium on Information Theory (ISIT), pp. 587–591. IEEE, 2019.
- Catoni, O. Pac-bayesian supervised classification: the thermodynamics of statistical learning. Vol. 56. Lecture Notes - Monograph Series. Institute of Mathematical Statistics, 2007.
- Chained generalisation bounds. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pp. 4212–4257. PMLR, 02–05 Jul 2022.
- Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA, 2006. ISBN 0471241954.
- Cédric, V. Optimal Transport: Old and New (Grundlehren der mathematischen Wissenschaften, 338). Springer, 2008.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, 2009.
- Sliced mutual information: A scalable measure of statistical dependence. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, 2021.
- $k$-sliced mutual information: A quantitative study of scalability with dimension. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.
- Pac-bayes, mac-bayes and conditional mutual information: Fast rate bounds that handle general vc classes. In Conference on Learning Theory. PMLR, 2021.
- Conditioning and processing: Techniques to improve information-theoretic generalization bounds. Advances in Neural Information Processing Systems, 33:16457–16467, 2020.
- Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms. Advances in Neural Information Processing Systems, 2020.
- Towards a unified information-theoretic framework for generalization. Advances in Neural Information Processing Systems, 34:26370–26381, 2021.
- Understanding generalization via leave-one-out conditional mutual information. In 2022 IEEE International Symposium on Information Theory (ISIT), pp. 2487–2492. IEEE, 2022.
- Limitations of information-theoretic generalization bounds for gradient descent methods in stochastic convex optimization. In International Conference on Algorithmic Learning Theory, pp. 663–706. PMLR, 2023.
- Information-theoretic generalization bounds for black-box learning algorithms. In Advances in Neural Information Processing Systems, 2021.
- Information-theoretic characterization of the generalization error for iterative semi-supervised learning. Journal of Machine Learning Research, 23:1–52, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Generalization bounds via information density and conditional information density. IEEE Journal on Selected Areas in Information Theory, 1(3):824–839, 2020.
- Fast-rate loss bounds via conditional information measures with applications to neural networks. In 2021 IEEE International Symposium on Information Theory (ISIT), pp. 952–957. IEEE, 2021.
- A new family of generalization bounds using samplewise evaluated CMI. In Advances in Neural Information Processing Systems, 2022a.
- Evaluated CMI bounds for meta learning: Tightness and expressiveness. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022b.
- Hoeffding, W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30, 1963.
- Three factors influencing minima in sgd. arXiv preprint arXiv:1711.04623, 2017.
- On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Advances in neural information processing systems, 21, 2008.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Information-theoretic generalization bounds for sgld via data-dependent estimates. Advances in Neural Information Processing Systems, 2019.
- Information-theoretic generalization bounds for stochastic gradient descent. In Conference on Learning Theory. PMLR, 2021.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Generalization error bounds for noisy, iterative algorithms. In 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, 2018.
- Lecture notes on information theory. Lecture Notes for 6.441 (MIT), ECE 563 (UIUC), STAT 364 (Yale), 2019., 2019.
- Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pp. 1674–1703. PMLR, 2017.
- On leave-one-out conditional mutual information for generalization. In Advances in Neural Information Processing Systems, 2022.
- On random subset generalization error bounds and the stochastic gradient langevin dynamics algorithm. In 2020 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE, 2021.
- Tighter expected generalization error bounds via wasserstein distance. Advances in Neural Information Processing Systems, 34, 2021.
- Controlling bias in adaptive data analysis using information theory. In Artificial Intelligence and Statistics. PMLR, 2016.
- How much does your data exploration overfit? controlling bias via information usage. IEEE Transactions on Information Theory, 66(1):302–323, 2019.
- Pac-bayesian inequalities for martingales. IEEE Transactions on Information Theory, 58(12):7086–7093, 2012.
- Shannon, C. E. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379–423, 1948. doi: 10.1002/j.1538-7305.1948.tb01338.x.
- Reasoning about generalization via conditional mutual information. In Conference on Learning Theory. PMLR, 2020.
- Pac-bayes-empirical-bernstein inequality. Advances in Neural Information Processing Systems, 26, 2013.
- Vapnik, V. Statistical learning theory. Wiley, 1998. ISBN 978-0-471-03003-4.
- Analyzing the generalization capability of sgld using properties of gaussian channels. Advances in Neural Information Processing Systems, 34:24222–24234, 2021.
- On the generalization of models trained with SGD: Information-theoretic bounds and implications. In International Conference on Learning Representations, 2022a.
- Two facets of sde under an information-theoretic lens: Generalization of sgd via training trajectories and via terminal states. arXiv preprint arXiv:2211.10691, 2022b.
- Information-theoretic analysis of unsupervised domain adaptation. In International Conference on Learning Representations, 2023.
- Information-theoretic analysis for transfer learning. In 2020 IEEE International Symposium on Information Theory (ISIT), pp. 2819–2824. IEEE, 2020.
- Information-theoretic analysis of generalization capability of learning algorithms. Advances in Neural Information Processing Systems, 2017.
- Fast-rate pac-bayes generalization bounds via shifted rademacher processes. Advances in Neural Information Processing Systems, 32, 2019.
- Yeung, R. W. A new outlook on shannon’s information measures. IEEE transactions on information theory, 37(3):466–474, 1991.
- Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Localization of vc classes: Beyond local rademacher complexities. Theoretical Computer Science, 742:27–49, 2018.
- Individually conditional individual mutual information bound on generalization error. IEEE Transactions on Information Theory, 68(5):3304–3316, 2022a.
- Stochastic chaining and strengthened information-theoretic generalization bounds. In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022b.
- Ziqiao Wang (40 papers)
- Yongyi Mao (45 papers)