Self-Supervised Reinforcement Learning for Recommender Systems (2006.05779v2)

Published 10 Jun 2020 in cs.LG and cs.AI

Abstract: In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards(e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.

Citations (181)

View on Semantic Scholar

Summary

The paper introduces a hybrid model architecture that merges self-supervised learning with reinforcement learning to enhance sequential recommendations.
It proposes two frameworks, SQN and SAC, which significantly improve metrics like Hit Rate and NDCG compared to traditional recommendation models.
The approach demonstrates practical improvements in long-term user engagement by effectively managing implicit, off-policy feedback in e-commerce scenarios.

Self-Supervised Reinforcement Learning for Recommender Systems

The paper "Self-Supervised Reinforcement Learning for Recommender Systems" addresses the challenges associated with modeling sequential recommendations in session-based environments, focusing on improving long-term user engagements through self-supervised reinforcement learning (RL). This approach is positioned as an augmentation over standard recommendation techniques, particularly when traditional supervised learning falls short in leveraging comprehensive user-item interaction dynamics.

Problem Statement

Sequential recommendation in recommender systems often involves various user-item interactions, ranging from clicks to direct purchases. Current state-of-the-art supervised models inadequately capture long-term nuances, primarily due to their reliance on immediate feedback. While RL theoretically offers an advantageous framework through policy learning via environmental interaction, its practical deployment in recommendation systems is often hampered by an inability to consistently expose users to potentially irrelevant content in an online setting. Consequently, reliance on off-policy learning from logged data comes with intrinsic challenges, notably the lack of negative feedback and biases inherent in such datasets.

Proposed Approach

Self-Supervised Reinforcement Learning is proposed as an amalgam of RL with traditional self-supervised learning layers. The paper introduces two frameworks to facilitate learning: Self-Supervised Q-learning (SQN) and Self-Supervised Actor-Critic (SAC). The primary contribution is a hybrid model architecture where a reinforcement learning head complements a supervised layer, focusing on distinct rewards, such as optimal product purchases over mere clicks.

Key Components:

Self-Supervised Head: This component provides gradient signals through cross-entropy loss, essential for robust learning and complementing the RL layer through regularization.
Reinforcement Learning Head: Augments the supervised model to align with long-term reward schemas.

The approach integrates these frameworks into existing recommendation models, validating effectiveness through empirical evaluation over two datasets in e-commerce.

Experimental Insights

The paper's experiments on the RC15 and RetailRocket datasets affirm that both SQN and SAC surpass baseline models— GRU4Rec, Caser, NextItNet, and SASRec—especially in contexts with high purchase frequency. Observed results indicate substantial improvements in Hit Rates and NDCG metrics, underscoring the reinforcement learning component's ability to balance myriad objectives within the recommendation lists optimally constrained by logged data.

Implications and Future Directions

The frameworks proposed by the authors mark a significant stride in the recommendation literature by reimagining the utilization of RL within implicit feedback constraints. Practically, this work suggests models' adaptability to varying reward structures, expanding applicability to scenarios like personalized content diversity or duration-centric media recommendations.

From a theoretical standpoint, a key takeaway revolves around the managed interaction between reinforcement components and conventional supervised learning mechanisms. Future research could explore RL's scope, particularly exploring varied policy-based approaches and potential synergies with emergent generative models in simulating richer user experiences. Additionally, expanding the framework towards slate-based interactions promises to extend its relevancy across multivariate action spaces in recommendation systems.