New Benchmarks for Learning on Non-Homophilous Graphs (2104.01404v2)

Published 3 Apr 2021 in cs.LG and cs.SI

Abstract: Much data with graph structures satisfy the principle of homophily, meaning that connected nodes tend to be similar with respect to a specific attribute. As such, ubiquitous datasets for graph machine learning tasks have generally been highly homophilous, rewarding methods that leverage homophily as an inductive bias. Recent work has pointed out this particular focus, as new non-homophilous datasets have been introduced and graph representation learning models better suited for low-homophily settings have been developed. However, these datasets are small and poorly suited to truly testing the effectiveness of new methods in non-homophilous settings. We present a series of improved graph datasets with node label relationships that do not satisfy the homophily principle. Along with this, we introduce a new measure of the presence or absence of homophily that is better suited than existing measures in different regimes. We benchmark a range of simple methods and graph neural networks across our proposed datasets, drawing new insights for further research. Data and codes can be found at https://github.com/CUAI/Non-Homophily-Benchmarks.

Citations (93)

View on Semantic Scholar

Summary

The paper introduces enhanced non-homophilous graph datasets sourced from diverse real-world networks to better evaluate learning models.
The paper proposes a new homophily measure that accounts for class imbalances, providing a more stable evaluation of graph structures.
Experimental results show that both traditional methods and specialized GNNs perform competitively, highlighting challenges in scalability and memory usage.

Understanding Benchmarks for Learning on Non-Homophilous Graphs

The paper "New Benchmarks for Learning on Non-Homophilous Graphs" addresses a critical challenge in graph representation learning by proposing enhanced datasets and metrics to better evaluate methods designed for non-homophilous graphs. As it stands, traditional graph machine learning tasks have largely revolved around datasets that exhibit high homophily—a property where connected nodes tend to share similar attributes. This paper, however, pivots the spotlight onto non-homophilous graphs, thereby broadening the scope of graph learning methods.

Key Contributions

One of the primary contributions of the paper is the introduction of improved graph datasets characterized by non-homophilous structures. These datasets are sourced from real-world contexts that inherently do not follow homophilic relationships—examples include social networks with gender interactions, biological protein links, and temporal citation networks. The authors criticized previous non-homophilous datasets for their limited size and scope, proposing instead larger and more diverse datasets to better assess the performance of learning models in non-homophilous contexts.

In addition to dataset contributions, the authors also present a new homophily measure designed to better evaluate the presence or absence of homophily in graphs. This metric accounts for class imbalances and offers a more stable indication of non-homophilous properties compared to previous measures which were susceptible to biases due to class distribution.

Experimental Insights

The benchmarks outlined in the paper span a variety of models, from simple graph-agnostic methods to advanced graph neural networks (GNNs) specifically designed for low-homophily environments. Notably, methods like two-hop label propagation and logistic regression on adjacency matrices (LINK) which have been traditionally overlooked, are shown to perform competitively, reaffirming the need for a large array of benchmarks. Likewise, state-of-the-art non-homophilous GNNs exhibited strong performances across many datasets, albeit with constraints on scalability and memory usage.

Through comprehensive evaluations, the paper provides valuable insights into how various models stack up against each other across different non-homophilous contexts. This analysis is crucial for pushing the boundaries of graph learning research, as it not only highlights the strengths of current methodologies but also identifies potential areas of improvement and innovation.

Implications and Future Directions

The significant implication of this paper lies in its methodological advancement for graph representation learning in complex, non-homophilous environments. Practically, these improved datasets and measures will serve as vital resources for researchers in developing robust models that can adapt to more diverse graph topologies beyond typical homophilic networks.

On a theoretical level, the paper encourages revisiting existing graph learning frameworks and their assumptions about data structures. Future research could expand on these developments by exploring other graph tasks such as link prediction and clustering, incorporating non-homophily into these paradigms.

In conclusion, this work establishes an important step toward understanding and optimizing machine learning models on non-homophilous graphs. As the field continues to evolve, the benchmarks provided here will undoubtedly play a pivotal role in shaping new strategies and achieving higher accuracy in complex network analyses.

PDF Markdown

Related Papers

GitHub

GitHub - CUAI/Non-Homophily-Benchmarks: [WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs (111 stars)