Learning to Embed Categorical Features without Embedding Tables for Recommendation (2010.10784v2)

Published 21 Oct 2020 in cs.LG and cs.IR

Abstract: Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.

Citations (59)

View on Semantic Scholar

Summary

The paper's main contribution is the Deep Hash Embedding (DHE) method that learns efficient embeddings without conventional embedding tables.
It encodes high-cardinality categorical features using multiple hash functions and a deep neural network, significantly reducing model size.
Experimental results on datasets like MovieLens and Amazon show DHE maintains state-of-the-art performance with enhanced scalability.

Overview of "Learning to Embed Categorical Features without Embedding Tables for Recommendation"

The paper "Learning to Embed Categorical Features without Embedding Tables for Recommendation" addresses a critical challenge in the domain of recommendation systems: efficiently embedding large-vocabulary categorical features without relying on traditional embedding tables. This research presents an innovative framework termed Deep Hash Embedding (DHE), which aims to mitigate the limitations of conventional methods that struggle with high-cardinality features and dynamically changing feature values.

Background

Embedding learning of categorical features, such as user and item IDs, plays a vital role in recommendation models like Matrix Factorization (MF) and Neural Collaborative Filtering (NCF). Standard procedures utilize embedding tables where unique vectors represent each feature value. While effective, this approach scales poorly with growing vocabulary sizes and fails to adapt to novel, unseen feature values, which are prevalent in real-world scenarios such as user engagements on digital platforms.

Novel Contributions and Methodology

The proposed DHE framework diverges from traditional embedding by eliminating the need for embedding tables. Instead, it operates as follows:

Encoding with Hash Methods: A categorical feature is transformed into a unique identifier vector using multiple hash functions and transformations. These encodings are deterministic and non-learnable, thereby sidestepping storage concerns.
Deep Neural Network (DNN) for Embedding Generation: The encoded feature vector is processed by a DNN, which converts it into the necessary embedding through training. This model is updated continually to adapt to new data inputs.

The DHE approach brings multiple advantages:

Reduced Model Size: Empirical results show DHE achieves comparable Area Under the Curve (AUC) metrics with less model complexity, reducing parameter count often by significant margins (e.g., one-quarter of the size required by one-hot full embeddings).
Collision-free Embedding Generation: Unlike methods utilizing the hashing trick with potential collisions, DHE provides a unique and dense encoding that promotes precise embedding without lookup processes.

This methodological shift also raises several research questions explored in this paper such as the effect of encoding strategies, the number of hash functions, and neural network architectures on the performance of DHE.

Experimental Setup and Results

Extensive experiments were conducted on datasets like MovieLens-20M and Amazon, utilizing models including Generalized Matrix Factorization (GMF) and Multi-layer Perceptron (MLP). Key observations include:

Performance: DHE maintains or exceeds state-of-the-art performance across various model sizes when contrasted with traditional embedding approaches.
Scalability: It demonstrates scaling efficiency with different numbers of hash functions, outperforming other hashing methods limited by collisions.
Neural Network Configuration: Deep architectures with specific activations (e.g., Mish activation) and batch normalization show improved training and application feasibility.

Implications and Future Work

The DHE framework presents an opportunity for significant improvements in the embedding domain, particularly in scenarios where feature cardinality and dynamics are challenging. It highlights a pivot towards using DNNs for encoding, promising better parameter efficiency and adaptability to computation advancements.

Future research could extend the DHE methodology to scenarios involving multivalent features, joint modeling of multiple categorical inputs, or hybrid systems balancing table-based and dynamic network approaches. Exploring these pathways could further solidify DHE's utility across diverse AI and machine learning applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/dfordp11/status/1918284557045858426

YouTube

Show All Videos