Utility-based Dueling Bandits as a Partial Monitoring Game (1507.02750v2)
Abstract: Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label efficient prediction. We study the utility-based dueling bandit problem as an instance of partial monitoring problem and prove that it fits the time-regret partial monitoring hierarchy as an easy - i.e. Theta (sqrt{T})- instance. We survey some partial monitoring algorithms and see how they could be used to solve dueling bandits efficiently. Keywords: Online learning, Dueling Bandits, Partial Monitoring, Partial Feedback, Multiarmed Bandits
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.