GraphMAE: Self-Supervised Masked Graph Autoencoders (2205.10803v3)

Published 22 May 2022 in cs.LG

Abstract: Self-supervised learning (SSL) has been extensively explored in recent years. Particularly, generative SSL has seen emerging success in natural language processing and other AI fields, such as the wide adoption of BERT and GPT. Despite this, contrastive learning-which heavily relies on structural data augmentation and complicated training strategies-has been the dominant approach in graph SSL, while the progress of generative SSL on graphs, especially graph autoencoders (GAEs), has thus far not reached the potential as promised in other fields. In this paper, we identify and examine the issues that negatively impact the development of GAEs, including their reconstruction objective, training robustness, and error metric. We present a masked graph autoencoder GraphMAE that mitigates these issues for generative self-supervised graph pretraining. Instead of reconstructing graph structures, we propose to focus on feature reconstruction with both a masking strategy and scaled cosine error that benefit the robust training of GraphMAE. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE-a simple graph autoencoder with careful designs-can consistently generate outperformance over both contrastive and generative state-of-the-art baselines. This study provides an understanding of graph autoencoders and demonstrates the potential of generative self-supervised pre-training on graphs.

Citations (432)

View on Semantic Scholar

Summary

The paper proposes GraphMAE, a novel autoencoder using masked feature reconstruction and scaled cosine error to overcome traditional GAE limitations.
It introduces re-mask decoding and a single-layer GNN decoder to bolster learning efficiency in node and graph classifications.
Experimental results across 21 datasets show significant accuracy gains on benchmarks like Cora, PubMed, and Reddit, challenging contrastive methods.

GraphMAE: Self-Supervised Masked Graph Autoencoders

Overview

The paper introduces "GraphMAE," a novel self-supervised masked graph autoencoder aimed at advancing generative self-supervised learning (SSL) within the domain of graph representation learning. The authors argue that, despite the proliferating success of generative SSL in fields such as NLP and CV—epitomized by BERT and GPT—the domain of graphs has predominantly remained within the competitive reach of contrastive learning techniques. The authors intend to address this gap by refining the framework of graph autoencoders (GAEs).

Key Contributions

Identification of Existing Challenges

The authors offer a detailed analysis of prevailing challenges in self-supervised GAEs:

Overemphasis on Structure: Traditional GAEs focus heavily on link prediction objectives, which may not translate well to tasks like node and graph classification.
Trivial Feature Reconstruction: Failure to disrupt input features can lead to learning trivial solutions.
Ineffective Error Metrics: The common use of MSE can be unstable due to its sensitivity to feature vector norms.
Limited Decoder Expression: The widespread use of simplistic decoders, such as MLPs, limits expressive capacity.

Proposed Solutions

The paper proposes GraphMAE with the following key design improvements:

Masked Feature Reconstruction: Disregarding graph structure reconstruction to favor robust, unstructured feature recovery.
Scaled Cosine Error (SCE): A new criterion that favors stability and adaptability by reducing over-reliance on straightforward cases.
Re-mask Decoding: Introduction of a procedure that masks encoded outputs before decoding to enhance complexity involvement.
Expressive Decoder Choice: A single-layer GNN used as a decoder, promising improved comparison and mapping of encoder outputs to target features.

Experimental Validation

GraphMAE was extensively tested across 21 datasets, encompassing node classification, graph classification, and transfer learning tasks. The results showed consistent improvements over existing SSL methods, such as DGI, MVGRL, and BGRL, effectively challenging the superiority of contrastive methods in graph learning.

Performance Highlights

GraphMAE demonstrated superior performance by:

Achieving notable accuracy improvements on benchmark datasets including Cora, PubMed, and OgBn-arxiv.
Exceeding previous state-of-the-art approaches in both unsupervised and supervised classification contexts.
Offering robust generalization capabilities evident in inductive tasks like PPI and Reddit datasets.

Implications and Future Directions

GraphMAE's introduction of masked feature reconstruction and the scaled cosine error positions it as a transformative methodology for graph embeddings and pre-training. These enhanced graph autoencoders are not only more effective for a multitude of tasks but also provide a simplified solution absent of the cumbersome negatives common in contrastive learning.

Future research could focus on expanding GraphMAE's capability to accommodate varying graph structures and further optimizing its architecture. There is potential for its underlying principles to contribute towards other multi-modal data representations or enhance transfer learning strategies in other domains.

In summary, GraphMAE stands as a promising direction for generative self-supervised learning in graphs, underscoring the need to revisit and refine the foundational premises of autoencoders in this area.

PDF Markdown

Related Papers

GitHub

GitHub - THUDM/GraphMAE: GraphMAE: Self-Supervised Masked Graph Autoencoders in KDD'22 (528 stars)