Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 45 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 105 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Hardness of Learning Boolean Functions from Label Proportions (2403.19401v1)

Published 28 Mar 2024 in cs.CC, cs.DS, and cs.LG

Abstract: In recent years the framework of learning from label proportions (LLP) has been gaining importance in machine learning. In this setting, the training examples are aggregated into subsets or bags and only the average label per bag is available for learning an example-level predictor. This generalizes traditional PAC learning which is the special case of unit-sized bags. The computational learning aspects of LLP were studied in recent works (Saket, NeurIPS'21; Saket, NeurIPS'22) which showed algorithms and hardness for learning halfspaces in the LLP setting. In this work we focus on the intractability of LLP learning Boolean functions. Our first result shows that given a collection of bags of size at most $2$ which are consistent with an OR function, it is NP-hard to find a CNF of constantly many clauses which satisfies any constant-fraction of the bags. This is in contrast with the work of (Saket, NeurIPS'21) which gave a $(2/5)$-approximation for learning ORs using a halfspace. Thus, our result provides a separation between constant clause CNFs and halfspaces as hypotheses for LLP learning ORs. Next, we prove the hardness of satisfying more than $1/2 + o(1)$ fraction of such bags using a $t$-DNF (i.e. DNF where each term has $\leq t$ literals) for any constant $t$. In usual PAC learning such a hardness was known (Khot-Saket, FOCS'08) only for learning noisy ORs. We also study the learnability of parities and show that it is NP-hard to satisfy more than $(q/2{q-1} + o(1))$-fraction of $q$-sized bags which are consistent with a parity using a parity, while a random parity based algorithm achieves a $(1/2{q-2})$-approximation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. The hardness of approximate optima in lattices, codes, and systems of linear equations. J. Comput. Syst. Sci., 54(2):317–331, 1997.
  2. Proof verification and the hardness of approximation problems. J. ACM, 45(3):501–555, 1998.
  3. S. Arora and S. Safra. Probabilistic checking of proofs: A new characterization of NP. J. ACM, 45(1):70–122, 1998.
  4. D. Barucic and J. Kybic. Fast learning from label proportions with small bags. CoRR, abs/2110.03426, 2021.
  5. Deep learning from label proportions for emphysema quantification. In MICCAI, volume 11071 of Lecture Notes in Computer Science, pages 768–776. Springer, 2018.
  6. Easy learning from label proportions. arXiv, 2023.
  7. Learning from aggregated data: Curated bags versus random bags. arXiv, 2023.
  8. Cost-based labeling of groups of mass spectra. In Proc. ACM SIGMOD International Conference on Management of Data, pages 167–178, 2004.
  9. Weakly supervised classification in high energy physics. Journal of High Energy Physics, 2017(5):1–11, 2017.
  10. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6):1558–1590, 2012.
  11. S. Ghoshal and R. Saket. Hardness of learning DNFs using halfspaces. In Proc. STOC, pages 467–480, 2021.
  12. Bypassing UGC from some optimal geometric inapproximability results. ACM Trans. Algorithms, 12(1):6:1–6:25, 2016.
  13. J. Håstad. Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001.
  14. Fitting the data from embryo implantation prediction: Learning from label proportions. Statistical methods in medical research, 27(4):1056–1066, 2018.
  15. S. Khot and R. Saket. Hardness of minimizing and learning DNF expressions. In Proc. FOCS, pages 231–240, 2008.
  16. Challenges and approaches to privacy preserving post-click conversion prediction. CoRR, abs/2201.12666, 2022.
  17. Quantifying emphysema extent from weakly labeled ct scans of the lungs using label proportions learning. In The Sixth International Workshop on Pulmonary Image Analysis, pages 31–42, 2016.
  18. R. O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.
  19. R. Raz. A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998.
  20. S. Rueping. SVM classifier estimation from group probabilities. In Proc. ICML, pages 911–918, 2010.
  21. R. Saket. Learnability of linear thresholds from label proportions. In Proc. NeurIPS, 2021.
  22. R. Saket. Algorithms and hardness for learning linear thresholds from label proportions. In Proc. NeurIPS, 2022.
  23. L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984.
  24. Using published medical results and non-homogenous data in rule learning. In Proc. International Conference on Machine Learning and Applications and Workshops, volume 2, pages 84–89. IEEE, 2011.
  25. On learning from label proportions. CoRR, abs/1402.5902, 2014.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.