Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

Published 18 Dec 2019 in cs.LG, cs.CV, and stat.ML | (1912.08795v2)

Abstract: We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network. We 'invert' a trained network (teacher) to synthesize class-conditional input images starting from random noise, without using any additional information about the training dataset. Keeping the teacher fixed, our method optimizes the input while regularizing the distribution of intermediate feature maps using information stored in the batch normalization layers of the teacher. Further, we improve the diversity of synthesized images using Adaptive DeepInversion, which maximizes the Jensen-Shannon divergence between the teacher and student network logits. The resulting synthesized images from networks trained on the CIFAR-10 and ImageNet datasets demonstrate high fidelity and degree of realism, and help enable a new breed of data-free applications - ones that do not require any real images or labeled data. We demonstrate the applicability of our proposed method to three tasks of immense practical importance -- (i) data-free network pruning, (ii) data-free knowledge transfer, and (iii) data-free continual learning. Code is available at https://github.com/NVlabs/DeepInversion

Abstract PDF Upgrade to Chat

Authors (8)

Citations (510)

View on Semantic Scholar

Summary

The paper presents a novel method that inverts CNNs to synthesize class-conditional images using internal Batch Normalization statistics.
It leverages an adaptive divergence technique to enhance image diversity and achieve robust knowledge transfer and model pruning.
Empirical tests on CIFAR-10 and ImageNet demonstrate competitive accuracy retention and effective data-free continual learning.

Data-Free Knowledge Transfer: DeepInversion Approach

This paper presents a novel method called DeepInversion, which synthesizes class-conditional images from a trained convolutional neural network (CNN) without access to the original training dataset. The methodology emphasizes a data-free approach to knowledge transfer, pruning, and continual learning, demonstrating robust applicability for neural network compression and adaptation tasks.

Methodology

DeepInversion focuses on inverting a trained network (teacher) to synthesize realistic input images from random noise. By leveraging intermediate feature statistics stored in Batch Normalization layers, it regularizes the distribution of feature maps and enhances the fidelity of generated images. The paper introduces a complementary technique, Adaptive DeepInversion, that increases image diversity by maximizing the Jensen-Shannon divergence between teacher and student network outputs.

Numerical and Empirical Evaluations

The authors evaluate their method on CIFAR-10 and ImageNet datasets. Notable results include generating 224x224 high-quality images that are class-conditional and contextually accurate, as demonstrated in Figure 1 of the original study. Verification tests show that DeepInversion images are correctly classified across multiple models with accuracy improvements over traditional methods like DeepDream.

For data-free pruning, DeepInversion achieves performance comparable to state-of-the-art methods that utilize real datasets, demonstrating a significant accuracy retention while providing substantial model compression.

Contributions to Knowledge Transfer and Continual Learning

In terms of knowledge transfer, the paper successfully distills knowledge from ResNet50v1.5 on ImageNet to a new network trained entirely on synthesized images, attaining a top-1 accuracy of 73.8%. This represents a mere 3.46% drop compared to the original model and highlights the method's efficiency in scenarios lacking access to the original dataset.

In data-free continual learning, DeepInversion facilitates the incorporation of new classes into a neural network trained on separate datasets, outperforming previous approaches such as LwF.MC by a considerable margin.

Theoretical and Practical Implications

Theoretically, this work provides insight into the latent capacities of trained networks to encode and synthesize high-dimensional image data. Practically, it addresses significant concerns regarding data privacy and resource allocation, as it foregoes the need for original data in applications like knowledge transfer and network pruning.

Future Directions

The continued development of data-free synthesis methods could impact the deployment of machine learning models on edge devices, enabling efficient resource utilization. Potential advancements may explore optimizing the synthesis speed, further improving image diversity, and adapting the approach to non-image domains.

In conclusion, DeepInversion offers a transformative perspective on utilizing pre-trained models for data-free applications, facilitating more efficient and secure model adaptations in practical AI deployments.

Markdown Report Issue