Agnostic Sample Compression Schemes for Regression (1810.01864v2)
Abstract: We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.
- Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM (JACM), 44(4):615–631, 1997.
- A theory of pac learnability of partial concept classes. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 658–671. IEEE, 2022.
- Function learning from interpolation. Combinatorics, Probability and Computing, 9(3):213–225, 2000.
- Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge, 1999.
- Near-optimal sample complexity bounds for robust learning of gaussian mixtures via compression schemes. Journal of the ACM (JACM), 67(6):1–42, 2020.
- Adversarially robust pac learnability of real-valued functions. In International Conference on Machine Learning, pages 1172–1199. PMLR, 2023.
- A characterization of semi-supervised adversarially robust pac learnability. Advances in Neural Information Processing Systems, 35:23646–23659, 2022.
- Optimal learners for realizable regression: Pac learning and online learning. Advances in Neural Information Processing Systems, 2023.
- Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56(2):174–190, 1998.
- Combinatorial variability of vapnik-chervonenkis classes with applications to sample compression schemes. Discrete Applied Mathematics, 86(1):3–25, 1998.
- Proper learning, helly number, and an optimal svm bound. In Conference on Learning Theory, pages 582–609. PMLR, 2020.
- A characterization of multiclass learnability. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022.
- Externally definable sets and dependent pairs. Israel J. Math., 194(1):409–425, 2013.
- Optimal learners for multiclass problems. In Conference on Learning Theory, pages 287–316. PMLR, 2014.
- Multiclass learnability and the erm principle. J. Mach. Learn. Res., 16(1):2377–2404, 2015.
- Supervised learning through the lens of compression. In Advances in Neural Information Processing Systems, pages 2784–2792, 2016.
- Y Dodge. Least absolute deviation regression. The Concise Encyclopedia of Statistics, pages 299–302, 2008.
- Sally Floyd. Space-bounded learning and the vapnik-chervonenkis dimension. In Proceedings of the second annual workshop on Computational learning theory, pages 349–364. Morgan Kaufmann Publishers Inc., 1989.
- Sample compression, learnability, and the vapnik-chervonenkis dimension. Machine learning, 21(3):269–304, 1995a.
- Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269–304, 1995b.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- Near-optimal sample compression for nearest neighbors. Advances in Neural Information Processing Systems, 27, 2014.
- PAC-bayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59(1-2):55–76, 2005a.
- Pac-bayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59:55–76, 2005b.
- Agnostic sample compression for linear regression. arXiv preprint arXiv:1810.01864, 2018.
- Sample compression for real-valued learners. In Algorithmic Learning Theory, pages 466–488. PMLR, 2019.
- David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation, 100(1):78–150, 1992.
- Learning integer lattices. SIAM Journal on Computing, 21(2):240–266, 1992.
- Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464–497, 1994.
- Balázs Kégl. Robust regression by boosting the median. In Learning Theory and Kernel Machines, pages 258–272. Springer, 2003.
- Primal and dual combinatorial dimensions. arXiv preprint arXiv:2108.10037, 2021.
- Fat-shattering dimension of k𝑘kitalic_k-fold maxima. arXiv preprint arXiv:2110.04763, 2021.
- Nearest-neighbor sample compression: Efficiency, consistency, infinite dimensions. Advances in Neural Information Processing Systems, 30, 2017.
- Unlabeled compression schemes for maximum classes. Journal of Machine Learning Research, 8:2047–2081, 2007.
- Relating data compression and learnability. 1986a.
- Relating data compression and learnability. Technical report, Department of Computer and Information Sciences, Santa Cruz, CA, Ju, 1986b.
- Honest compressions and their application to compression schemes. In Conference on Learning Theory, pages 77–92, 2013.
- Shahar Mendelson. Learning without concentration. J. ACM, 62(3):21:1–21:25, 2015.
- Vc classes are adversarially robustly learnable, but only improperly. In Conference on Learning Theory, pages 2512–2530. PMLR, 2019.
- Reducing adversarially robust learning to non-robust pac learning. Advances in Neural Information Processing Systems, 33:14626–14637, 2020.
- Adversarially robust learning with unknown perturbation sets. In Conference on Learning Theory, pages 3452–3482. PMLR, 2021.
- Adversarially robust learning: A generic minimax optimal learner and characterization. Advances in Neural Information Processing Systems, 35:37458–37470, 2022.
- Sample compression schemes for vc classes. Journal of the ACM (JACM), 63(3):1–10, 2016.
- Teaching and compressing for low vc-dimension. In A Journey Through Discrete Mathematics, pages 633–656. Springer, 2017.
- Chirag Pabbaraju. Multiclass learnability does not imply sample compression. arXiv preprint arXiv:2308.06424, 2023.
- David Pollard. Convergence of Stochastic Processes. Springer-Verlag, 1984.
- David Pollard. Empirical processes: theory and applications. NSF-CBMS Regional Conference Series in Probability and Statistics, 2. Institute of Mathematical Statistics, 1990a.
- David Pollard. Empirical processes: theory and applications. In NSF-CBMS regional conference series in probability and statistics, pages i–86. JSTOR, 1990b.
- A geometric approach to sample compression. Journal of Machine Learning Research, 13(4), 2012.
- Shifting: One-inclusion mistake bounds and sample compression. Journal of Computer and System Sciences, 75(1):37–59, 2009.
- Manfred K. Warmuth. Compressing to VC dimension many points. In Proceedings of the 16thsuperscript16normal-th16^{{\rm th}}16 start_POSTSUPERSCRIPT roman_th end_POSTSUPERSCRIPT Conference on Learning Theory, 2003.
- A compression technique for analyzing disagreement-based active learning. J. Mach. Learn. Res., 16:713–745, 2015.
- Idan Attias (21 papers)
- Steve Hanneke (73 papers)
- Aryeh Kontorovich (65 papers)
- Menachem Sadigurschi (6 papers)