A Model-based Multi-Agent Personalized Short-Video Recommender System (2405.01847v1)

Published 3 May 2024 in cs.IR and cs.AI

Abstract: Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a model-based multi-agent framework that formulates short-video recommendations as a Markov Decision Process to maximize user engagement.
It leverages an attention mechanism for collaborative ranking and employs simulated feedback to mitigate sample selection bias.
Experimental results show a 7.3% GAUC and 7.1% NCIS improvement, validated through thorough offline evaluations and live A/B testing.

Exploring a Reinforcement Learning Framework for Short-Video Recommendations

Introduction to the Problem

In the ever-evolving landscape of short-video applications—a domain dominated by giants like TikTok and YouTube Shorts—creating a personalized, engaging user experience is paramount. This experience primarily hinges on the app's ability to recommend videos that resonate with individual preferences. Traditionally, these recommendations have been handled by generating and ranking video suggestions during a user's session, but this approach has its complexities.

One of the main challenges is considering the long-term engagement (measured as cumulative satisfaction over a session) rather than focusing just on immediate user reactions. The paper addresses this by formulating the recommendation process as a Markov Decision Process (MDP) and solving it using a Reinforcement Learning (RL) framework. Specifically, it tackles the nuanced aspects of multi-dimensional user preferences and the omnipresent issue of sample selection bias in recommendations.

Understanding the Proposed Solution

The core proposal of the paper is a Model-based Multi-agent Ranking Framework (MMRF), designed to maximize user engagement, specifically watch-time, within the constraints of multi-aspect user preferences. Here's how it works:

Multi-agent Collaboration Framework

The MMRF incorporates multiple agents where each agent aims to optimize for different facets of user interaction (like likes, follows, comments, etc.).
Instead of working in isolation, these agents interact and leverage information from one another to make informed decisions about which videos to recommend. This is orchestrated through an attention mechanism, allowing agents to focus on relevant information from peers to enhance the ranking process.

Addressing Sample Selection Bias with Model-based Learning

A significant leap made by MMRF is its approach to mitigating sample selection bias (SSB), a common issue where models are biased towards frequently appearing samples. The framework extends to use non-impression samples (videos that were not shown to the user) to balance the training dataset.
It employs a feedback fitting model to simulate user reactions to these non-impression samples, integrating these simulated feedbacks into the training process to provide a more rounded learning experience for the RL system.

Experimental Insights and Results

The paper doesn't hold back on backing its claims with solid experimental proof, both from offline evaluations and live experiments.

Strong Numerical Results

Comparatively, MMRF shows a notable improvement over other methodologies, especially in WatchTime, where it achieves a 7.3% GAUC increase and a 7.1% NCIS increase.
It also shows robust performance across various user satisfaction metrics, indicating a well-rounded approach to optimizing multiple aspects of user engagement.

A/B Testing in a Real-World Scenario

When deployed on a large-scale industrial platform, MMRF outperformed existing models, demonstrating enhancements in user watch-time and interaction rates.
The online A/B testing provided additional validation of the framework's effectiveness in a live environment, showing superior performance over standard learning-to-rank and other advanced RL approaches.

Theoretical and Practical Implications

The MMRF framework isn't just a theoretical contribution; it's a practical toolkit that has been effectively integrated into a real-world application, serving hundreds of millions of users. The use of a collaborative multi-agent system and the innovative approach to tackle SSB sets a precedent for future work in both academic and industrial settings.

Future Speculations in AI and Recommendations

Looking ahead, the principles laid out in MMRF could translate into more sophisticated, context-aware systems in other domains of AI-driven recommendations. One could imagine similar frameworks being adapted for personalized learning experiences in education tech, dynamic content delivery in streaming services, or even for enhancing user interactions in virtual reality settings.

In conclusion, the RL-based MMRF framework marks a significant step towards solving complex, real-world problems in the recommender systems domain. By addressing key issues like long-term user engagement and bias in training data, it paves the way for more intelligent, sensitive, and user-centric recommendation engines.

Related Papers

Tweets

https://twitter.com/_reachsumit/status/1787367990125793392

https://twitter.com/gm8xx8/status/1787290869445390537

https://twitter.com/arxivsanitybot/status/1787474080905523622