Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PAC-Bayesian Offline Contextual Bandits With Guarantees (2210.13132v2)

Published 24 Oct 2022 in stat.ML and cs.LG

Abstract: This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Otmane Sakhi (11 papers)
  2. Pierre Alquier (48 papers)
  3. Nicolas Chopin (52 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.