Distilling Knowledge from Graph Convolutional Networks (2003.10477v4)

Published 23 Mar 2020 in cs.CV

Abstract: Existing knowledge distillation methods focus on convolutional neural networks (CNNs), where the input samples like images lie in a grid domain, and have largely overlooked graph convolutional networks (GCN) that handle non-grid data. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a pre-trained GCN model. To enable the knowledge transfer from the teacher GCN to the student, we propose a local structure preserving module that explicitly accounts for the topological semantics of the teacher. In this module, the local structure information from both the teacher and the student are extracted as distributions, and hence minimizing the distance between these distributions enables topology-aware knowledge transfer from the teacher, yielding a compact yet high-performance student model. Moreover, the proposed approach is readily extendable to dynamic graph models, where the input graphs for the teacher and the student may differ. We evaluate the proposed method on two different datasets using GCN models of different architectures, and demonstrate that our method achieves the state-of-the-art knowledge distillation performance for GCN models. Code is publicly available at https://github.com/ihollywhy/DistillGCN.PyTorch.

Citations (208)

View on Semantic Scholar

Summary

The paper introduces the LSP module to preserve and transfer intrinsic topological information from a teacher GCN to a compact student model.
It employs probability distributions and minimizes Kullback-Leibler divergence between teacher and student models, ensuring effective topology-aware knowledge distillation.
Experimental results show a 60-90% reduction in model size with comparable accuracy, validating the approach on tasks like node classification and 3D object recognition.

Distilling Knowledge from Graph Convolutional Networks: A Comprehensive Overview

The paper "Distilling Knowledge from Graph Convolutional Networks" introduces a novel approach specifically targeting the distillation of knowledge from Graph Convolutional Networks (GCNs). Traditional knowledge distillation methods primarily focus on Convolutional Neural Networks (CNNs), neglecting the unique topological data handled by GCNs. This research presents a pioneering strategy to efficiently transfer the intrinsic topological knowledge embedded in GCNs from a pre-trained teacher model to a more compact student model using a Local Structure Preserving (LSP) module.

Methodological Contributions

A key innovation in this work is the LSP module, which facilitates the knowledge transfer process by retaining the topological semantics of the teacher GCN. The LSP explicitly captures the local structure information of both the teacher and student models as probability distributions, thus optimizing the topology-aware knowledge transfer. This method considers the node features and their topological interconnections, ensuring that the student model mirrors the intricate structure captured by the teacher.

The research delineates the LSP's operation by mapping the local structure around each node into a distribution and then minimizing the Kullback-Leibler divergence between the distributions of the teacher and the student models. Such an approach addresses the shortcomings of existing distillation methods when applied to non-grid data like graphs, which inadequately account for the rich topological information inherent in graph-based data representations.

Further, the proposed methodology demonstrates versatility by extending to dynamic graph models. This adaptability is crucial as it allows handling situations where the input graph to the student model may vary from the teacher's, thereby enhancing the method's general applicability across different GCN architectures.

Experimental Validation

The effectiveness of this distillation approach is validated through comprehensive experiments conducted on datasets like Protein-Protein Interaction (PPI) for node classification and ModelNet40 for 3D object recognition. The presented results indicate that the method not only achieves state-of-the-art performance in knowledge distillation for GCNs but also outperforms other distillation techniques such as KD, FitNet, and Attention Transfer when applied to GCNs.

Specifically, the student models trained using the proposed method significantly approach the teacher models' performance levels while maintaining a reduced computational footprint, quantifying up to a 60-90% reduction in the model size for comparable accuracy metrics. This demonstrates the practical utility of the LSP module in generating efficient yet high-performing student models.

Implications and Future Directions

This paper's work carries substantial implications for the field of model compression, particularly in the context of deploying GCNs on resource-constrained devices. The LSP module's ability to preserve and transfer topological information opens avenues for improved efficiency in developing agile, lightweight GCN models without excessive performance trade-offs.

Potential future research directions could explore the extension of the LSP methodology to more complex GCN variants and other forms of dynamic and evolving networks. Additionally, integrating this structure-preserving approach into broader AI systems may accelerate advancements in fields utilizing graph-based data, such as social network analysis, molecular chemistry, and more.

In conclusion, the introduction of a structured, topology-aware knowledge distillation mechanism for GCNs marks a significant step forward in model efficiency and scalability. The insights and results from this research hold promise for optimizing the deployment of graph-based models across diverse computational environments.

PDF Markdown

Related Papers

GitHub

GitHub - ihollywhy/DistillGCN.PyTorch: Source code for "Distilling Knowledge From Graph Convolutional Networks", CVPR'20 (57 stars)