Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attributed Graph Clustering via Adaptive Graph Convolution (1906.01210v1)

Published 4 Jun 2019 in cs.LG, cs.AI, and stat.ML

Abstract: Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.

Citations (271)

Summary

  • The paper introduces AGC, a novel method that adaptively selects the optimal k-order graph convolution to enhance feature smoothing for clustering.
  • AGC leverages iterative low-pass filtering using a symmetric normalized Laplacian to aggregate multi-hop neighborhood information and improve cluster compactness.
  • Experimental results on datasets like Cora and Citeseer show AGC outperforms fixed-order GCN methods by achieving higher accuracy, NMI, and F1 scores.

The paper "Attributed Graph Clustering via Adaptive Graph Convolution" (1906.01210) introduces the Adaptive Graph Convolution (AGC) method for attributed graph clustering. This technique aims to improve upon existing methods by leveraging higher-order graph convolutions to capture global cluster structures and by adaptively determining the optimal convolution order for a given graph, addressing the limitation of fixed, low-order convolutions commonly used in Graph Convolutional Network (GCN) based approaches.

Methodology: Adaptive Graph Convolution (AGC)

The core idea of AGC is to pre-process node features using a graph convolution operator specifically designed as a low-pass filter, thereby smoothing the features according to the graph topology. This smoothing encourages nodes within the same cluster (presumed to be densely connected) to have more similar feature representations. The process is decoupled from deep learning architectures; it's a feature transformation step followed by a standard clustering algorithm.

1. k-Order Low-Pass Graph Convolution:

AGC employs a specific graph filter derived from the symmetrically normalized Laplacian, Ls=ID1/2AD1/2L_s = I - D^{-1/2} A D^{-1/2}, where AA is the adjacency matrix and DD is the degree matrix. The chosen base filter is G=I0.5LsG = I - 0.5 L_s. This filter is applied iteratively kk times to the original node feature matrix XX:

Xˉ=GkX=(I0.5Ls)kX\bar{X} = G^k X = (I - 0.5 L_s)^k X

This operation constitutes a k-order graph convolution. Each application of GG effectively averages a node's features with those of its immediate neighbors. Applying GkG^k aggregates information from neighbors up to kk hops away, acting as a low-pass filter in the graph spectral domain. The frequency response of GG is p(λ)=10.5λp(\lambda) = 1 - 0.5\lambda, where λ\lambda represents the eigenvalues of LsL_s. Since 0λ20 \le \lambda \le 2 for LsL_s, the response p(λ)p(\lambda) is non-negative and non-increasing, satisfying the conditions for a low-pass filter that smooths the signal (node features) by attenuating high-frequency components associated with feature variations between adjacent nodes.

2. Spectral Clustering on Smoothed Features:

After obtaining the smoothed feature matrix Xˉ\bar{X}, a similarity matrix WW is constructed. The paper uses a linear kernel: K=XˉXˉTK = \bar{X} \bar{X}^T. To ensure symmetry and non-negativity, the final similarity matrix is computed as:

W=0.5×(K+KT)W = 0.5 \times (|K| + |K^T|)

Standard spectral clustering is then applied to this similarity matrix WW to partition the nodes into CC clusters.

3. Adaptive Order Selection:

A key component of AGC is the adaptive selection of the convolution order kk. Using a fixed kk is suboptimal, as different graphs require different degrees of smoothing, and excessive smoothing (very large kk) can merge distinct clusters. AGC determines kk iteratively:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Algorithm 1: Adaptive Order Selection for AGC

Input: Feature matrix X, Adjacency matrix A, Number of clusters C
Output: Cluster partition C^(k*)

1: Compute Ls = I - D^(-1/2) A D^(-1/2)
2: Initialize smoothed features bar(X)^(0) = X
3: Initialize k = 1
4: Initialize intra_dist_prev = infinity

5: loop
6:    Compute bar(X)^(k) = (I - 0.5 Ls) * bar(X)^(k-1)  // Eq. (10)
7:    Compute similarity W^(k) from bar(X)^(k) using linear kernel and symmetrization
8:    Perform Spectral Clustering on W^(k) to get partition C^(k)
9:    Compute intra-cluster distance intra(C^(k)) using Eq. (11):
       intra(C^(k)) = (1/N) * sum_{c=1 to C} sum_{i in Cluster c} || bar(x)_i^(k) - mean(bar(x)_j^(k) for j in Cluster c) ||^2
10:   if intra(C^(k)) > intra_dist_prev or k reaches max_iterations then
11:      k* = k - 1  // Select previous order
12:      Output C^(k*)
13:      break
14:   end if
15:   intra_dist_prev = intra(C^(k))
16:   k = k + 1
17: end loop

The algorithm iteratively increases the convolution order kk, performs clustering, and calculates the average intra-cluster variance (distance) using the smoothed features Xˉ(k)\bar{X}^{(k)}. It stops and selects the order k1k-1 corresponding to the first local minimum encountered in the intra-cluster distance. This signifies a point where clusters are compact, but further smoothing starts to blend them, increasing the variance within the resulting larger, merged clusters.

Theoretical Analysis

The paper provides theoretical justification for the feature smoothing effect.

Smoothness Quantification: The smoothness of a graph signal ff (a column of the feature matrix XX) is measured using the graph Laplacian quadratic form, related to the Laplacian-Beltrami operator:

Ω(f/f2)=fTLffTf\Omega(f / \|f\|_2) = \frac{f^T L f}{f^T f}

where LL is a graph Laplacian (e.g., LsL_s). Lower values indicate smoother signals, meaning connected nodes have more similar values.

Theorem 1: This theorem states that applying a graph filter GG whose frequency response p(λ)p(\lambda) is non-negative and non-increasing on the spectrum of the Laplacian to a signal ff results in a smoother or equally smooth signal fˉ=Gf\bar{f} = Gf:

Ω(fˉ/fˉ2)Ω(f/f2)\Omega(\bar{f} / \|\bar{f}\|_2) \le \Omega(f / \|f\|_2)

Implication: Since the chosen filter G=I0.5LsG = I - 0.5 L_s has a frequency response p(λ)=10.5λp(\lambda) = 1 - 0.5\lambda, which is non-negative and non-increasing for λ[0,2]\lambda \in [0, 2] (the range of eigenvalues for LsL_s), Theorem 1 applies. Applying GkG^k means iteratively applying such a smoothing filter. Consequently, as kk increases, the features Xˉ\bar{X} become progressively smoother with respect to the graph structure. This aligns with the clustering objective, as nodes within dense subgraphs (putative clusters) should ideally have similar representations. The adaptive selection mechanism is motivated by the fact that excessive smoothing (kk \to \infty) would make all node features converge to the same value (related to the graph's principal eigenvector), destroying cluster structure.

Experimental Validation

AGC was evaluated on four standard benchmark datasets: Cora, Citeseer, Pubmed, and Wiki.

Performance Comparison: AGC was compared against various baselines:

  • Feature-only methods (k-means, spectral clustering on features).
  • Structure-only methods (spectral clustering on graph, DeepWalk, DNGR).
  • Attributed graph clustering methods (GAE, VGAE, MGAE, ARGE, ARVGE).

Results showed that AGC consistently achieved state-of-the-art or highly competitive performance across all datasets using standard metrics (Accuracy - Acc, Normalized Mutual Information - NMI, F1-score). Notably, AGC demonstrated significant improvements over GAE/VGAE and ARGE/ARVGE on Cora, Citeseer, and Pubmed. The paper attributes this to AGC's ability to leverage higher-order structural information via the adaptive k-order convolution, whereas baseline GCN-based methods typically rely on fixed 2 or 3-layer architectures (equivalent to 2 or 3-hop information aggregation).

Validation of Adaptive k: Experiments confirmed the effectiveness of the adaptive selection strategy. Plots showed that the automatically selected order kk^* (where the intra-cluster distance first increased) closely corresponded to the order yielding optimal or near-optimal clustering performance metrics (Acc, NMI, F1). The optimal k varied significantly across datasets (e.g., k=12k^*=12 for Cora, k=55k^*=55 for Citeseer, k=60k^*=60 for Pubmed, k=8k^*=8 for Wiki), underscoring the necessity of the adaptive approach rather than a fixed k.

Efficiency and Stability: AGC exhibited low variance across multiple runs. Computationally, it avoids the parameter training overhead of deep learning models. The primary costs are the kk sparse matrix-vector multiplications (or dense matrix multiplications if implemented that way) and the spectral clustering step.

Implementation Considerations

Implementing AGC involves several key steps:

  1. Laplacian Computation: Calculate the symmetrically normalized Laplacian Ls=ID1/2AD1/2L_s = I - D^{-1/2} A D^{-1/2}. This requires computing the degree matrix DD from the adjacency matrix AA. Care must be taken with sparse matrix representations for efficiency, especially for large graphs.
  2. k-Order Convolution: Implement the iterative application Xˉ(k)=(I0.5Ls)Xˉ(k1)\bar{X}^{(k)} = (I - 0.5 L_s) \bar{X}^{(k-1)}. This is essentially kk steps of feature propagation/smoothing. Using sparse matrix multiplication libraries (e.g., scipy.sparse in Python) is crucial for scalability. The complexity per iteration is roughly proportional to the number of edges if AA is sparse, or O(N2d)O(N^2 d) for dense matrix multiplication if features have dimension dd. Total cost for this stage is O(k×ComplexityPerIteration)O(k \times \text{ComplexityPerIteration}).
  3. Similarity Matrix: Compute K=Xˉ(k)(Xˉ(k))TK = \bar{X}^{(k)} (\bar{X}^{(k)})^T. This can be computationally intensive (O(N2d)O(N^2 d)). Then compute W=0.5×(K+KT)W = 0.5 \times (|K| + |K^T|).
  4. Spectral Clustering: Apply spectral clustering to WW. Standard implementations often involve computing the top CC eigenvectors of the Laplacian derived from WW, which typically takes O(N3)O(N^3) time for dense eigen-decomposition, although faster methods exist (e.g., using iterative solvers like LOBPCG if only a few eigenvectors are needed, potentially reducing complexity closer to O(N2)O(N^2) or less depending on sparsity and solver efficiency).
  5. Adaptive Loop: Enclose steps 2-4 within the loop described in Algorithm 1, calculating the intra-cluster distance at each step to find the optimal kk^*. The maximum value for kk needs consideration; the paper suggests stopping criteria based on the distance increase or a maximum iteration count.

The overall complexity is significantly influenced by the chosen kk^*, the size of the graph NN, the feature dimension dd, and the efficiency of the sparse matrix operations and spectral clustering implementation. For very large kk^* or dense graphs, the computation can become substantial.

Conclusion

The AGC method provides an effective approach for attributed graph clustering by using a theoretically grounded low-pass graph filter to smooth node features over potentially high-order neighborhoods. Its adaptive mechanism for selecting the convolution order kk allows it to tailor the degree of smoothing to specific graph characteristics, avoiding the limitations of fixed-order methods and demonstrating strong empirical performance on benchmark datasets. The decoupling of feature smoothing from complex neural network training offers a potentially simpler and efficient alternative for combining structural and attribute information for clustering.