Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agnostic Sample Compression Schemes for Regression (1810.01864v2)

Published 3 Oct 2018 in cs.LG, cs.IT, math.IT, math.ST, stat.ML, and stat.TH

Abstract: We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM (JACM), 44(4):615–631, 1997.
  2. A theory of pac learnability of partial concept classes. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 658–671. IEEE, 2022.
  3. Function learning from interpolation. Combinatorics, Probability and Computing, 9(3):213–225, 2000.
  4. Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge, 1999.
  5. Near-optimal sample complexity bounds for robust learning of gaussian mixtures via compression schemes. Journal of the ACM (JACM), 67(6):1–42, 2020.
  6. Adversarially robust pac learnability of real-valued functions. In International Conference on Machine Learning, pages 1172–1199. PMLR, 2023.
  7. A characterization of semi-supervised adversarially robust pac learnability. Advances in Neural Information Processing Systems, 35:23646–23659, 2022.
  8. Optimal learners for realizable regression: Pac learning and online learning. Advances in Neural Information Processing Systems, 2023.
  9. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56(2):174–190, 1998.
  10. Combinatorial variability of vapnik-chervonenkis classes with applications to sample compression schemes. Discrete Applied Mathematics, 86(1):3–25, 1998.
  11. Proper learning, helly number, and an optimal svm bound. In Conference on Learning Theory, pages 582–609. PMLR, 2020.
  12. A characterization of multiclass learnability. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022.
  13. Externally definable sets and dependent pairs. Israel J. Math., 194(1):409–425, 2013.
  14. Optimal learners for multiclass problems. In Conference on Learning Theory, pages 287–316. PMLR, 2014.
  15. Multiclass learnability and the erm principle. J. Mach. Learn. Res., 16(1):2377–2404, 2015.
  16. Supervised learning through the lens of compression. In Advances in Neural Information Processing Systems, pages 2784–2792, 2016.
  17. Y Dodge. Least absolute deviation regression. The Concise Encyclopedia of Statistics, pages 299–302, 2008.
  18. Sally Floyd. Space-bounded learning and the vapnik-chervonenkis dimension. In Proceedings of the second annual workshop on Computational learning theory, pages 349–364. Morgan Kaufmann Publishers Inc., 1989.
  19. Sample compression, learnability, and the vapnik-chervonenkis dimension. Machine learning, 21(3):269–304, 1995a.
  20. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269–304, 1995b.
  21. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  22. Near-optimal sample compression for nearest neighbors. Advances in Neural Information Processing Systems, 27, 2014.
  23. PAC-bayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59(1-2):55–76, 2005a.
  24. Pac-bayesian compression bounds on the prediction error of learning algorithms for classification. Machine Learning, 59:55–76, 2005b.
  25. Agnostic sample compression for linear regression. arXiv preprint arXiv:1810.01864, 2018.
  26. Sample compression for real-valued learners. In Algorithmic Learning Theory, pages 466–488. PMLR, 2019.
  27. David Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and computation, 100(1):78–150, 1992.
  28. Learning integer lattices. SIAM Journal on Computing, 21(2):240–266, 1992.
  29. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3):464–497, 1994.
  30. Balázs Kégl. Robust regression by boosting the median. In Learning Theory and Kernel Machines, pages 258–272. Springer, 2003.
  31. Primal and dual combinatorial dimensions. arXiv preprint arXiv:2108.10037, 2021.
  32. Fat-shattering dimension of k𝑘kitalic_k-fold maxima. arXiv preprint arXiv:2110.04763, 2021.
  33. Nearest-neighbor sample compression: Efficiency, consistency, infinite dimensions. Advances in Neural Information Processing Systems, 30, 2017.
  34. Unlabeled compression schemes for maximum classes. Journal of Machine Learning Research, 8:2047–2081, 2007.
  35. Relating data compression and learnability. 1986a.
  36. Relating data compression and learnability. Technical report, Department of Computer and Information Sciences, Santa Cruz, CA, Ju, 1986b.
  37. Honest compressions and their application to compression schemes. In Conference on Learning Theory, pages 77–92, 2013.
  38. Shahar Mendelson. Learning without concentration. J. ACM, 62(3):21:1–21:25, 2015.
  39. Vc classes are adversarially robustly learnable, but only improperly. In Conference on Learning Theory, pages 2512–2530. PMLR, 2019.
  40. Reducing adversarially robust learning to non-robust pac learning. Advances in Neural Information Processing Systems, 33:14626–14637, 2020.
  41. Adversarially robust learning with unknown perturbation sets. In Conference on Learning Theory, pages 3452–3482. PMLR, 2021.
  42. Adversarially robust learning: A generic minimax optimal learner and characterization. Advances in Neural Information Processing Systems, 35:37458–37470, 2022.
  43. Sample compression schemes for vc classes. Journal of the ACM (JACM), 63(3):1–10, 2016.
  44. Teaching and compressing for low vc-dimension. In A Journey Through Discrete Mathematics, pages 633–656. Springer, 2017.
  45. Chirag Pabbaraju. Multiclass learnability does not imply sample compression. arXiv preprint arXiv:2308.06424, 2023.
  46. David Pollard. Convergence of Stochastic Processes. Springer-Verlag, 1984.
  47. David Pollard. Empirical processes: theory and applications. NSF-CBMS Regional Conference Series in Probability and Statistics, 2. Institute of Mathematical Statistics, 1990a.
  48. David Pollard. Empirical processes: theory and applications. In NSF-CBMS regional conference series in probability and statistics, pages i–86. JSTOR, 1990b.
  49. A geometric approach to sample compression. Journal of Machine Learning Research, 13(4), 2012.
  50. Shifting: One-inclusion mistake bounds and sample compression. Journal of Computer and System Sciences, 75(1):37–59, 2009.
  51. Manfred K. Warmuth. Compressing to VC dimension many points. In Proceedings of the 16thsuperscript16normal-th16^{{\rm th}}16 start_POSTSUPERSCRIPT roman_th end_POSTSUPERSCRIPT Conference on Learning Theory, 2003.
  52. A compression technique for analyzing disagreement-based active learning. J. Mach. Learn. Res., 16:713–745, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Idan Attias (21 papers)
  2. Steve Hanneke (73 papers)
  3. Aryeh Kontorovich (65 papers)
  4. Menachem Sadigurschi (6 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.