BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Published 14 Apr 2019 in cs.IR and cs.LG | (1904.06690v2)

Abstract: Modeling users' dynamic and evolving preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks (e.g., Recurrent Neural Network) to encode users' historical interactions from left to right into hidden representations for making recommendations. Although these methods achieve satisfactory results, they often assume a rigidly ordered sequence which is not always practical. We argue that such left-to-right unidirectional architectures restrict the power of the historical sequence representations. For this purpose, we introduce a Bidirectional Encoder Representations from Transformers for sequential Recommendation (BERT4Rec). However, jointly conditioning on both left and right context in deep bidirectional model would make the training become trivial since each item can indirectly "see the target item". To address this problem, we train the bidirectional model using the Cloze task, predicting the masked items in the sequence by jointly conditioning on their left and right context. Comparing with predicting the next item at each position in a sequence, the Cloze task can produce more samples to train a more powerful bidirectional model. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.

Abstract PDF Upgrade to Chat

Authors (7)

Citations (1,843)

View on Semantic Scholar

Summary

The paper introduces a novel sequential recommendation model using bidirectional Transformer architecture that predicts masked user interactions with a Cloze task objective.
It employs multi-head self-attention to capture contextual dependencies from both past and future user interactions, achieving significant improvements in HR, NDCG, and MRR metrics.
The approach highlights the potential of leveraging rich, bidirectional context in recommendation systems to overcome the limitations of traditional unidirectional models.

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

The paper "BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer" (1904.06690) introduces a novel approach to sequential recommendation models. This research leverages the power of the BERT architecture to improve user-item interaction predictions by capturing contextual information from both past and future interactions, unlike earlier models that rely solely on past interactions.

Motivation and Background

In traditional sequential recommendation systems, user behavior is typically modeled using unidirectional methods, such as RNNs, which predict future interactions based solely on past behavior. This approach, while effective, often falls short in capturing the full context because user interactions are treated as strictly ordered sequences. However, in reality, user interactions can lack such clear temporal order due to various influencing factors. The BERT4Rec model addresses this limitation by adopting a bidirectional attention mechanism that allows for more robust modeling of user preferences.

BERT4Rec Model Architecture

At the core of BERT4Rec is the bidirectional self-attention mechanism, which is designed to capture the contextual dependencies between user interactions, similar to how BERT handles language text. The model introduces several notable features:

Transformer Architecture: BERT4Rec employs a stack of Transformer layers, which use multi-head self-attention to allow each interaction in the user's history to incorporate information from any other interaction.
Cloze Task Objective: The model is trained using a variant of the Cloze task, where random items in a user sequence are masked and then predicted based on their surrounding context. This setup prevents information leakage during training and enables capturing both sides of context around each interaction.

Implementation Steps

Data Preprocessing: Interaction histories are converted into sequences, where each sequence represents the interactions of a single user ordered by time. Only users with sufficient interaction data are considered.
Model Training:
- The model uses a sequence of item embeddings augmented with positional embeddings to account for item order within the sequence.
- The Cloze task is simulated by randomly masking some items in the sequence, and the model predicts these masked items. This encourages learning distributed representations that depend on the context from both sides.
Model Output: The final layer projects the hidden states from the transformer through a softmax layer to predict the masked item IDs.
Model Optimization: The model is optimized using the Adam optimizer, with hyperparameters such as learning rate and dropout being tuned to prevent overfitting, especially in sparse datasets.

Performance and Evaluation

BERT4Rec consistently outperforms traditional sequential recommendation models on benchmark datasets. The conditional bidirectional self-attention captures nuanced user preferences more effectively than unidirectional models. Comparative evaluations demonstrate significant improvements in HR, NDCG, and MRR metrics across datasets like Beauty and MovieLens.

Discussion and Future Work

The fusion of bidirectional context via self-attention networks in BERT4Rec significantly enhances the quality of sequential recommendations by integrating comprehensive user interaction histories. The model's reliance on Transformer layers allows it to scale with sequence length, though care must be taken with computational resources due to self-attention's quadratic complexity with respect to sequence length.

Future work could explore enriching item features within BERT4Rec, integrating richer metadata like categories or sentiments, and improving scalability despite the model’s inherent complex nature. Extending the model to capture user profiles explicitly while handling multiple sessions presents promising avenues for further enhancement.

Markdown Report Issue