Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Is Transductive Learning Equivalent to PAC Learning? (2405.05190v2)

Published 8 May 2024 in stat.ML, cs.DS, cs.LG, math.ST, and stat.TH

Abstract: Much of learning theory is concerned with the design and analysis of probably approximately correct (PAC) learners. The closely related transductive model of learning has recently seen more scrutiny, with its learners often used as precursors to PAC learners. Our goal in this work is to understand and quantify the exact relationship between these two models. First, we observe that modest extensions of existing results show the models to be essentially equivalent for realizable learning for most natural loss functions, up to low order terms in the error and sample complexity. The situation for agnostic learning appears less straightforward, with sample complexities potentially separated by a $\frac{1}{\epsilon}$ factor. This is therefore where our main contributions lie. Our results are two-fold: 1. For agnostic learning with bounded losses (including, for example, multiclass classification), we show that PAC learning reduces to transductive learning at the cost of low-order terms in the error and sample complexity via an adaptation of the reduction of arXiv:2304.09167 to the agnostic setting. 2. For agnostic binary classification, we show the converse: transductive learning is essentially no more difficult than PAC learning. Together with our first result this implies that the PAC and transductive models are essentially equivalent for agnostic binary classification. This is our most technical result, and involves two steps: A symmetrization argument on the agnostic one-inclusion graph (OIG) of arXiv:2309.13692 to derive the worst-case agnostic transductive instance, and expressing the error of the agnostic OIG algorithm for this instance in terms of the empirical Rademacher complexity of the class. We leave as an intriguing open question whether our second result can be extended beyond binary classification to show the transductive and PAC models equivalent more broadly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Optimal pac bounds without uniform convergence. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1203–1223, Los Alamitos, CA, USA, nov 2023a. IEEE Computer Society. doi: 10.1109/FOCS57990.2023.00071.
  2. The one-inclusion graph algorithm is not always optimal. In The Thirty Sixth Annual Conference on Learning Theory, pages 72–88. PMLR, 2023b.
  3. A theory of pac learnability of partial concept classes. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 658–671. IEEE, 2022.
  4. Regularization and optimal multiclass learning. arXiv preprint arXiv:2309.13692, 2023.
  5. Learnability is a compact property. arXiv preprint arXiv:2402.10360, 2024.
  6. Optimal learners for realizable regression: Pac learning and online learning. arXiv preprint arXiv:2307.03848, 2023.
  7. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56(2):174–190, 1998. ISSN 0022-0000. doi: https://doi.org/10.1006/jcss.1997.1557.
  8. Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 52(3):434–452, 1996. ISSN 0022-0000. doi: https://doi.org/10.1006/jcss.1996.0033.
  9. Occam’s razor. Information Processing Letters, 24(6):377–380, 1987. ISSN 0020-0190. doi: https://doi.org/10.1016/0020-0190(87)90114-1.
  10. Learnability and the vapnik-chervonenkis dimension. J. ACM, 36(4):929–965, oct 1989. ISSN 0004-5411. doi: 10.1145/76359.76371. URL https://doi.org/10.1145/76359.76371.
  11. A theory of universal learning. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, page 532–541, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380539. doi: 10.1145/3406325.3451087.
  12. A characterization of multiclass learnability. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 943–955. IEEE, 2022.
  13. A. Daniely and S. Shalev-Shwartz. Optimal learners for multiclass problems. In M. F. Balcan, V. Feldman, and C. Szepesvári, editors, Proceedings of The 27th Conference on Learning Theory, volume 35 of Proceedings of Machine Learning Research, pages 287–316, Barcelona, Spain, 13–15 Jun 2014. PMLR.
  14. Multiclass learnability and the erm principle. In S. M. Kakade and U. von Luxburg, editors, Proceedings of the 24th Annual Conference on Learning Theory, volume 19 of Proceedings of Machine Learning Research, pages 207–232, Budapest, Hungary, 09–11 Jun 2011. PMLR.
  15. Supervised learning through the lens of compression. Advances in Neural Information Processing Systems, 29, 2016.
  16. R. M. Dudley. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1:125–165, 1967.
  17. S. Hanneke. The optimal sample complexity of pac learning. Journal of Machine Learning Research, 17(38):1–15, 2016.
  18. D. Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992. ISSN 0890-5401. doi: https://doi.org/10.1016/0890-5401(92)90010-D.
  19. D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded vapnik-chervonenkis dimension. Journal of Combinatorial Theory, Series A, 69(2):217–232, 1995. ISSN 0097-3165. doi: https://doi.org/10.1016/0097-3165(95)90052-7.
  20. Predicting {0, 1}-functions on randomly drawn points. Information and Computation, 115(2):248–292, 1994. ISSN 0890-5401. doi: https://doi.org/10.1006/inco.1994.1097.
  21. A. Kupavskii and N. Zhivotovskiy. When are epsilon-nets small? J. Comput. Syst. Sci., 110(C):22–36, jun 2020. ISSN 0022-0000. doi: 10.1016/j.jcss.2019.12.006.
  22. W. Kuszmaul and Q. Qi. The multiplicative version of azuma’s inequality, with an application to contention analysis. arXiv preprint arXiv:2102.05077, 2021.
  23. K. G. Larsen. Bagging is an optimal pac learner. In The Thirty Sixth Annual Conference on Learning Theory, pages 450–468. PMLR, 2023.
  24. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In 28th Annual Symposium on Foundations of Computer Science (sfcs 1987), pages 68–77, 1987. doi: 10.1109/SFCS.1987.37.
  25. N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm. Machine Learning, 2(4):285–318, Apr. 1988. ISSN 1573-0565. doi: 10.1023/A:1022869011914. URL https://doi.org/10.1023/A:1022869011914.
  26. B. K. Natarajan. On learning sets and functions. Machine Learning, 4:67–97, 1989.
  27. L. Rebeschi. Advanced foundations of learning. Lecture notes on Advanced Foundations of Learning, University of Oxford, Department of Statistics, 2022. Available at https://web.archive.org/web/20231001122838/https://www.stats.ox.ac.uk/~rebeschi/teaching/AFoL/22/material/lecture05.pdf.
  28. Shifting, one-inclusion mistake bounds and tight multiclass expected risk bounds. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
  29. S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press, 2014. ISBN 978-1-10-705713-5.
  30. H. U. Simon. General bounds on the number of examples needed for learning probabilistic concepts. Journal of Computer and System Sciences, 52(2):239–254, 1996. ISSN 0022-0000. doi: https://doi.org/10.1006/jcss.1996.0019.
  31. M. Talagrand. Majorizing measures: The generic chaining. The Annals of Probability, 24(3):1049–1103, 1996. ISSN 00911798.
  32. L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, nov 1984. ISSN 0001-0782. doi: 10.1145/1968.1972.
  33. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264–280, 1971. doi: 10.1137/1116025.
  34. M. K. Warmuth. The optimal pac algorithm. In International Conference on Computational Learning Theory, pages 641–642. Springer, 2004.
  35. Expected worst case regret via stochastic sequential covering. arXiv preprint arXiv:2209.04417, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com