Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks (1708.04617v1)

Published 15 Aug 2017 in cs.LG

Abstract: Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a $8.6\%$ relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep and DeepCross with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: https://github.com/hexiangnan/attentional_factorization_machine

Citations (857)

Summary

  • The paper integrates an attention mechanism into Factorization Machines to assign dynamic weights to feature interactions.
  • It demonstrates an 8.6% RMSE improvement on the Frappe dataset and outperforms deep models like Wide&Deep on MovieLens.
  • The AFM model is parameter-efficient, enhances interpretability, and mitigates overfitting via dropout and L2 regularization.

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

The paper "Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks" authored by Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua, provides significant advancements in the field of supervised learning, particularly in enhancing Factorization Machines (FMs) through the integration of an attention mechanism. Below is an insightful overview of their research:

Introduction and Motivation

Supervised learning is an integral domain within machine learning, with numerous applications encompassing recommendation systems, online advertising, and image recognition, among others. Traditional models like linear regression often fall short due to their inability to account for interactions between categorical features effectively. Factorization Machines (FMs), introduced as a solution to this limitation, model second-order feature interactions by leveraging latent vectors. However, one critical disadvantage of conventional FMs is their uniform weighting of all feature interactions, which does not reflect the varying predictive significance of different interactions. This paper addresses this limitation by introducing the Attentional Factorization Machine (AFM) that employs a neural attention network to learn and scale the importance of feature interactions from the data.

Core Contributions

AFM introduces an attention mechanism to FM, thereby allowing for differentiated weighting of feature interactions based on their predictive importance. The proposed model consists of several key components:

  • Pair-wise Interaction Layer: Mirrors the traditional FM by expanding feature vectors into interacted vectors denoted as element-wise products.
  • Attention-based Pooling Layer: Implements an attention network, parameterized by a multi-layer perceptron (MLP) to compute attention scores for the interactions. These scores reflect the importance of interactions, thus facilitating a more nuanced aggregation of interaction terms.

Experimental Evaluation

To validate the effectiveness of AFM, extensive experiments were conducted on two real-world datasets: Frappe for context-aware recommendation, and MovieLens for personalized tag recommendation. Notably, AFM achieved a substantial performance boost over standard FMs and other state-of-the-art deep learning models like Wide&Deep and DeepCross.

  • Performance Metrics: On Frappe, AFM realized an 8.6% improvement over traditional FM in root mean square error (RMSE). Similarly, consistent performance enhancements were observed on MovieLens, where AFM outperformed Wide&Deep and DeepCross models, despite the latter's deeper and more complex architectures.
  • Parameter Efficiency: Despite having fewer parameters than models like Wide&Deep, AFM maintained superior prediction accuracy, highlighting its efficiency and effectiveness in modeling feature interactions.

Theoretical and Practical Implications

AFM presents several theoretical advancements and practical implications:

  1. Enhanced Model Interpretability: By assigning specific attention scores to feature interactions, AFM not only improves prediction accuracy but also provides insights into which feature interactions are most significant, thus improving model transparency.
  2. Improvement in Generalization: The integration of dropout in the pair-wise interaction layer along with L2L_2 regularization in the attention network fortifies the model against overfitting, enhancing its ability to generalize well on unseen data.
  3. Ease of Integration and Scalability: AFM's architecture can be seamlessly integrated into existing FM frameworks and extended to a variety of predictive tasks across different domains.

Future Directions

The prospects for future research stemming from AFM are multifaceted:

  • Exploration of deep AFM variants by incorporating additional non-linear layers could further amplify the model's capabilities in capturing complex feature interactions.
  • Investigation into optimal techniques for reducing computational complexity while maintaining high performance, such as leveraging learning-to-hash and data sampling strategies.
  • Extending AFM's framework to semi-supervised and multi-view learning, with a focus on methods like graph Laplacians and co-regularization to enhance predictive accuracy in more intricate scenarios.

In conclusion, AFM represents a pivotal step forward in the evolution of feature interaction learning in supervised models, offering significant improvements in predictive performance and model interpretability. Its broad applicability and efficient architecture position it as a highly valuable tool in the machine learning practitioner's toolkit.