Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Doubly Calibrated Estimator for Recommendation on Data Missing Not At Random (2403.00817v1)

Published 26 Feb 2024 in cs.IR and cs.LG

Abstract: Recommender systems often suffer from selection bias as users tend to rate their preferred items. The datasets collected under such conditions exhibit entries missing not at random and thus are not randomized-controlled trials representing the target population. To address this challenge, a doubly robust estimator and its enhanced variants have been proposed as they ensure unbiasedness when accurate imputed errors or predicted propensities are provided. However, we argue that existing estimators rely on miscalibrated imputed errors and propensity scores as they depend on rudimentary models for estimation. We provide theoretical insights into how miscalibrated imputation and propensity models may limit the effectiveness of doubly robust estimators and validate our theorems using real-world datasets. On this basis, we propose a Doubly Calibrated Estimator that involves the calibration of both the imputation and propensity models. To achieve this, we introduce calibration experts that consider different logit distributions across users. Moreover, we devise a tri-level joint learning framework, allowing the simultaneous optimization of calibration experts alongside prediction and imputation models. Through extensive experiments on real-world datasets, we demonstrate the superiority of the Doubly Calibrated Estimator in the context of debiased recommendation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Don’t just blame over-parametrization for over-confidence: Theoretical analysis of calibration in binary classification. In ICML.
  2. Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation. In RecSys.
  3. Tianfeng Chai and Roland R Draxler. 2014. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific model development 7, 3 (2014), 1247–1250.
  4. AutoDebias: Learning to debias for recommendation. In SIGIR.
  5. Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems 41, 3 (2023), 1–39.
  6. A generalized doubly robust learning framework for debiasing post-click conversion rate prediction. In SIGKDD.
  7. Shrey Desai and Greg Durrett. 2020. Calibration of Pre-trained Transformers. In EMNLP.
  8. Interpolative distillation for unifying biased and debiased recommendation. In SIGIR.
  9. Local temperature scaling for probability calibration. In ICCV.
  10. KuaiRec: A fully-observed dataset and insights for evaluating recommender systems. In CIKM.
  11. On calibration of modern neural networks. In ICML.
  12. Enhanced doubly robust learning for debiasing post-click conversion rate estimation. In SIGIR.
  13. Neural collaborative filtering. In WWW.
  14. Jin Huang and Charles X Ling. 2005. Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering 17, 3 (2005), 299–310.
  15. Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79–87.
  16. Categorical Reparameterization with Gumbel-Softmax. In ICLR.
  17. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
  18. DE-RRD: A Knowledge Distillation Framework for Recommender System. In CIKM.
  19. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
  20. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.
  21. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. In AISTATS.
  22. Verified uncertainty calibration. In NeurIPS.
  23. Top-Personalized-K Recommendation. In WWW.
  24. Obtaining Calibrated Probabilities with Personalized Ranking Models. In AAAI.
  25. Multiple robust learning for recommendation. In AAAI.
  26. TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations. In ICLR.
  27. Removing hidden confounding in recommendation: a unified multi-task learning approach. In NeurIPS.
  28. Balancing unobserved confounding with a few unbiased ratings in debiased recommendations. In WWW.
  29. Propensity matters: Measuring and enhancing balancing for recommendation. In ICML. 20182–20194.
  30. StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. In ICLR.
  31. A general knowledge distillation framework for counterfactual recommendation via uniform data. In SIGIR.
  32. Rating distribution calibration for selection bias mitigation in recommendations. In WWW. 2048–2057.
  33. A survey on causal inference for recommendation. The Innovation (2023).
  34. Revisiting the calibration of modern neural networks. In NeurIPS.
  35. Obtaining well calibrated probabilities using bayesian binning. In AAAI.
  36. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS.
  37. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10, 3 (1999), 61–74.
  38. Behaviour policy estimation in off-policy policy evaluation: Calibration matters. arXiv preprint arXiv:1807.01066 (2018).
  39. Yuta Saito. 2020. Asymmetric tri-training for debiasing missing-not-at-random explicit feedback. In SIGIR.
  40. Doubly robust prediction and evaluation methods improve uplift modeling for observational data. In SDM.
  41. Unbiased recommender learning from missing-not-at-random implicit feedback. In WSDM.
  42. Uplift-based evaluation and optimization of recommenders. In RecSys.
  43. Recommendations as treatments: Debiasing learning and evaluation. In ICML.
  44. Harald Steck. 2010. Training and testing of recommender systems on data missing not at random. In SIGKDD.
  45. Cab: Continuous adaptive blending for policy evaluation and learning. In ICML.
  46. Adith Swaminathan and Thorsten Joachims. 2015. The self-normalized estimator for counterfactual learning. In NeurIPS.
  47. Escm2: Entire space counterfactual multi-task model for post-click conversion rate estimation. In SIGIR.
  48. Doubly robust joint learning for recommendation on data missing not at random. In ICML.
  49. Combating selection biases in recommender systems with a few unbiased ratings. In WSDM.
  50. Information theoretic counterfactual learning from missing-not-at-random feedback. In NeurIPS.
  51. On the Opportunity of Causal Learning in Recommendation Systems: Foundation, Estimation, Prediction and Challenges. In IJCAI.
  52. Large-scale causal approaches to debiasing post-click conversion rate estimation with multi-task learning. In WWW.
  53. Unbiased implicit recommendation and propensity estimation via combinational joint learning. In RecSys.

Summary

We haven't generated a summary for this paper yet.