Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Balanced off-policy evaluation in general action spaces (1906.03694v4)

Published 9 Jun 2019 in cs.LG, stat.ME, and stat.ML

Abstract: Estimation of importance sampling weights for off-policy evaluation of contextual bandits often results in imbalance - a mismatch between the desired and the actual distribution of state-action pairs after weighting. In this work we present balanced off-policy evaluation (B-OPE), a generic method for estimating weights which minimize this imbalance. Estimation of these weights reduces to a binary classification problem regardless of action type. We show that minimizing the risk of the classifier implies minimization of imbalance to the desired counterfactual distribution of state-action pairs. The classifier loss is tied to the error of the off-policy estimate, allowing for easy tuning of hyperparameters. We provide experimental evidence that B-OPE improves weighting-based approaches for offline policy evaluation in both discrete and continuous action spaces.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Arjun Sondhi (7 papers)
  2. David Arbour (30 papers)
  3. Drew Dimmery (13 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.