Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Combinatorial Stochastic-Greedy Bandit (2312.08057v1)

Published 13 Dec 2023 in cs.LG, cs.AI, math.CO, math.OC, and stat.ML

Abstract: We propose a novel combinatorial stochastic-greedy bandit (SGB) algorithm for combinatorial multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time step $t\in [T]$ is observed. SGB adopts an optimized stochastic-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms. Unlike existing methods that explore the entire set of unselected base arms during each selection step, our SGB algorithm samples only an optimized proportion of unselected arms and selects actions from this subset. We prove that our algorithm achieves a $(1-1/e)$-regret bound of $\mathcal{O}(n{\frac{1}{3}} k{\frac{2}{3}} T{\frac{2}{3}} \log(T){\frac{2}{3}})$ for monotone stochastic submodular rewards, which outperforms the state-of-the-art in terms of the cardinality constraint $k$. Furthermore, we empirically evaluate the performance of our algorithm in the context of online constrained social influence maximization. Our results demonstrate that our proposed approach consistently outperforms the other algorithms, increasing the performance gap as $k$ grows.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8): 6557–6565.
  2. Stochastic Top K-Subset Bandits with Linear Space and Non-Linear Feedback with Applications to Social Influence Maximization. ACM/IMS Transactions on Data Science (TDS), 2(4): 1–39.
  3. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2): 235–256.
  4. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1): 48–77.
  5. Diverse client selection for federated learning via submodular maximization. In International Conference on Learning Representations.
  6. What Doubling Tricks Can and Can’t Do for Multi-Armed Bandits. ArXiv, abs/1803.06971.
  7. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008: 10008.
  8. Stochastic linear optimization under bandit feedback. In 21st Annual Conference on Learning Theory, 355–366.
  9. Edmonds, J. 2003. Submodular functions, matroids, and certain polyhedra. In Combinatorial Optimization—Eureka, You Shrink!, 11–26. Springer.
  10. Feige, U. 1998. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4): 634–652.
  11. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4): 1133–1153.
  12. Randomized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, 7455–7471. PMLR.
  13. FilFL: Accelerating Federated Learning via Client Filtering. arXiv preprint arXiv:2302.06599.
  14. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6): 1115–1145.
  15. Online submodular maximization under a matroid constraint with application to learning assignments. arXiv preprint arXiv:1407.1082.
  16. Hoeffding, W. 1994. Probability inequalities for sums of bounded random variables. In The collected works of Wassily Hoeffding, 409–426. Springer.
  17. A combinatorial strongly polynomial algorithm for minimizing submodular functions. Journal of the ACM (JACM), 48(4): 761–777.
  18. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 137–146.
  19. Bandit algorithms. Cambridge University Press.
  20. Online influence maximization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 645–654.
  21. Learning to Discover Social Circles in Ego Networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.
  22. Online Influence Maximization under Linear Threshold Model. arXiv preprint arXiv:2011.06378.
  23. Lazier than lazy greedy. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
  24. An analysis of approximations for maximizing submodular set functions—I. Mathematical programming, 14(1): 265–294.
  25. Online Learning via Offline Greedy: Applications in Market Design and Optimization. EC 2021, Management Science Journal.
  26. Online learning via offline greedy algorithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, 737–738.
  27. An Explore-then-Commit Algorithm for Submodular Maximization Under Full-bandit Feedback. In The 38th Conference on Uncertainty in Artificial Intelligence.
  28. A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.
  29. Budgeted online influence maximization. In International Conference on Machine Learning, 7620–7631. PMLR.
  30. Promoting Diversity in Recommendation by Entropy Regularizer. In IJCAI.
  31. Top-k𝑘kitalic_k Combinatorial Bandits with Full-Bandit Feedback. In Algorithmic Learning Theory, 752–776.
  32. An Optimal Learning Algorithm for Online Unconstrained Submodular Maximization. In Proceedings of the 31st Conference On Learning Theory, 1307–1325.
  33. An Online Algorithm for Maximizing Submodular Functions. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, 1577–1584. Red Hook, NY, USA: Curran Associates Inc.
  34. Submodular Bandit Problem Under Multiple Constraints. In Conference on Uncertainty in Artificial Intelligence, 191–200. PMLR.
  35. Model-independent online learning for influence maximization. In International Conference on Machine Learning, 3530–3539. PMLR.
  36. Online influence maximization under independent cascade model with semi-bandit feedback. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 3026–3036.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 10 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube