Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback (2009.07518v1)

Published 16 Sep 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.

Authors (4)

Alexandre Letard (1 paper)
Tassadit Amghar (1 paper)
Olivier Camp (1 paper)
Nicolas Gutowski (5 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback (2009.07518v1)

Summary

Related Papers