Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs (2308.04314v1)

Published 8 Aug 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lin Yang (212 papers)
  2. Xuchuang Wang (18 papers)
  3. Mohammad Hajiesmaili (47 papers)
  4. Lijun Zhang (239 papers)
  5. John C. S. Lui (112 papers)
  6. Don Towsley (177 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.