Deep Interest Network for Click-Through Rate Prediction (1706.06978v4)

Published 21 Jun 2017 in stat.ML and cs.LG

Abstract: Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.

Citations (1,691)

View on Semantic Scholar

Summary

The paper presents DIN, a model that overcomes fixed-length vector limitations by dynamically capturing user interests through a local activation unit.
DIN employs an attention mechanism to weight historical behaviors relative to candidate ads, enhancing the relevance of user representations.
Experimental results on Amazon, MovieLens, and Alibaba datasets demonstrate significant improvements in RPM and CTR, confirming its industrial impact.

Deep Interest Network for Click-Through Rate Prediction

The paper "Deep Interest Network for Click-Through Rate Prediction" addresses the critical task of click-through rate (CTR) prediction in online advertising systems, with a specific focus on overcoming the limitations of existing deep learning models. The authors propose a novel model called Deep Interest Network (DIN), which adeptly captures the diverse interests of users derived from their rich historical behaviors. This model was developed and deployed by Alibaba, one of the largest e-commerce and advertising platforms globally.

Problem and Motivation

Traditionally, deep learning-based CTR prediction models employ a common Embedding-MLP paradigm. These models map large-scale sparse input features into low-dimensional embeddings, which are then pooled and fed into a multilayer perceptron (MLP) to learn nonlinear feature relationships. However, these methods compress user features into a fixed-length vector, inadequately capturing users' diverse interests. This paper highlights that using a fixed-length vector representation constrains the model's ability to express diverse user interests effectively, especially in industrial systems where user behavior data is extensive and varied.

Proposed Solution: Deep Interest Network (DIN)

The core innovation in DIN is the introduction of a local activation unit that dynamically learns the representation of user interests from their historical behaviors with respect to a specific advertisement. The representation vector is not fixed; it varies for different ads, significantly enhancing the model's expressive capability.

DIN utilizes an attention mechanism where it computes the relevance of each behavior to the candidate ad and aggregates these behaviors accordingly. This relevance-aware mechanism ensures that the behaviors most pertinent to the ad being considered dominate the user interest vector.

Key components of DIN include:

Local Activation Unit: This unit weighs user behaviors based on their relevance to the candidate ad, generating an adaptively varying representation vector.
Mini-batch Aware Regularization: Addressing the overfitting challenge inherent in training large-scale deep networks, this regularization technique limits the computation to parameters of features appearing in mini-batches, making the process computationally feasible.
Data Adaptive Activation Function (Dice): Generalizing the PReLU activation function, Dice adapts the rectification point w.r.t. the input distribution, thereby improving the training effectiveness of industrial networks with sparse features.

Experimental Validation

The paper provides comprehensive experimental validation across three datasets: Amazon, MovieLens, and proprietary Alibaba datasets, showcasing the effectiveness of DIN.

Amazon Dataset: With user behavior data from product reviews, DIN achieved a noticeable improvement in AUC, particularly benefiting from its design to handle rich, multi-faceted user behaviors.
MovieLens Dataset: Similar trends were observed in the context of movie ratings, where DIN outperformed traditional models.
Alibaba Dataset: Given the massive scale, the experiments demonstrated that DIN significantly outperforms existing state-of-the-art models, yielding up to a 3.8% improvement in Revenue Per Mille (RPM) and a 10% increase in CTR in online A/B tests.

Implications and Future Work

This paper makes several critical contributions to the field:

Enhanced User Representation: By designing a mechanism that adapts user interest representation dynamically, DIN removes the constraints imposed by fixed-length vectors, offering a more nuanced understanding of user preferences.
Scalable Solutions for Industrial Applications: The proposed regularization techniques make it feasible to train large-scale models without overfitting, addressing a significant challenge in practical deployments.
Adaptability of Activation Functions: The Dice activation function showcases how adaptive methods can improve the convergence and performance of deep networks in sparse data scenarios.

While the empirical results substantiate the effectiveness of DIN, there are avenues for further research. Future studies can explore more sophisticated attention mechanisms or sequential models to handle the temporal dynamics of user behavior. Additionally, extending these methods to other high-stakes domains outside e-commerce, such as healthcare recommendations or financial fraud detection, could be of immense value.

In conclusion, the introduction of the Deep Interest Network marks a significant step forward in CTR prediction, demonstrating a clear pathway for leveraging deep learning to better capture and utilize user behavior data in large-scale industrial applications.

PDF Markdown

Related Papers

YouTube

Show All Videos