SmoothGrad: removing noise by adding noise (1706.03825v1)

Published 12 Jun 2017 in cs.LG, cs.CV, and stat.ML

Abstract: Explaining the output of a deep network remains a challenge. In the case of an image classifier, one type of explanation is to identify pixels that strongly influence the final decision. A starting point for this strategy is the gradient of the class score function with respect to the input image. This gradient can be interpreted as a sensitivity map, and there are several techniques that elaborate on this basic idea. This paper makes two contributions: it introduces SmoothGrad, a simple method that can help visually sharpen gradient-based sensitivity maps, and it discusses lessons in the visualization of these maps. We publish the code for our experiments and a website with our results.

Citations (2,090)

View on Semantic Scholar

Summary

The paper introduces the SmoothGrad technique that adds Gaussian noise to input images and averages the resulting sensitivity maps to reduce visual noise.
The method demonstrates enhanced clarity on benchmarks like MNIST and ILSVRC, outperforming traditional gradient approaches with an optimal noise level and sample size.
SmoothGrad provides practical benefits for model debugging and inspires further research into noise-driven methods for interpreting deep neural networks.

SmoothGrad: Removing Noise by Adding Noise

Introduction

Deep neural networks have been at the forefront of many advanced applications, particularly in image classification. However, interpreting the output of these networks remains a significant challenge. One common interpretability method involves creating sensitivity maps that identify pixels most influential to a classifier's decision. Traditional gradient-based methods, despite their utility, often produce noise-laden sensitivity maps that obscure meaningful interpretation. This paper introduces SmoothGrad, a method designed to reduce visual noise in these maps by averaging gradients from perturbed versions of an input image.

Methodology

The authors propose two key contributions:

SmoothGrad Technique: By adding Gaussian noise to an image and averaging the resulting sensitivity maps, SmoothGrad effectively smooths out local gradient fluctuations. Mathematically, for an image $x$ and class $c$ , this is represented as:

$\hat{M_c}(x) = \frac{1}{n} \sum_1^n M_c(x + \mathcal{N}(0, \sigma^2))$

where $n$ is the number of samples, and $\mathcal{N}(0, \sigma^2)$ denotes Gaussian noise with standard deviation $\sigma$ .

Visualization Enhancements: The authors also discuss various post-processing techniques to improve the readability of sensitivity maps, such as capping outlier values and the potential benefits of multiplying sensitivity maps by the original image.

Experimental Results

The experiments conducted used well-established benchmarks, including the MNIST dataset and the ILSVRC-2013 ImageNet dataset, leveraging models such as Inception v3. Key findings include:

Noise Level and Sample Size: The authors investigated the effect of varying noise levels and sample sizes. They observed that a noise level of 10-20% and around 50 samples were optimal for generating coherent sensitivity maps.
Comparison with Baseline Methods: SmoothGrad was compared with vanilla gradient methods, Integrated Gradients, and Guided BackProp. SmoothGrad consistently produced more visually coherent sensitivity maps, particularly when the object of interest was set against a uniform background.
Discriminativity: Sensitivity maps should ideally indicate which parts of an image were influential for a given class. SmoothGrad's discriminativity was qualitatively superior, effectively distinguishing between different objects within the same image.
Combination with Other Methods: When combined with Integrated Gradients and Guided BackProp, SmoothGrad further enhanced the quality of sensitivity maps, indicating its versatility as a supplementary technique.

Implications and Future Research

SmoothGrad has practical implications for debugging and improving neural networks. Sharper sensitivity maps can help identify and rectify model weaknesses more effectively. Theoretically, it prompts further exploration into the behavior of gradients in neural networks and how noise can be employed to extract more meaningful interpretations.

Several avenues for future research are suggested:

Theoretical Validation: While the empirical results are compelling, further theoretical work is needed to understand why SmoothGrad is effective. This includes exploring the geometry of class score functions and the impact of spatial statistics on gradient behavior.
Training with Noise: Extending the idea of inference-time noise to training, the authors posit that regularizing models with noise during training can improve sensitivity maps, warranting deeper investigation.
Evaluation Metrics: There is a need for robust quantitative metrics to evaluate sensitivity map quality. This involves leveraging image segmentation databases and developing measures for spatial coherence and discriminativity.

In summary, SmoothGrad provides a straightforward yet effective approach to enhancing the interpretability of gradient-based sensitivity maps. By incorporating noise, the method addresses the common issue of visual noise, offering clearer insights into model decisions. The combination of practical success and theoretical promise ensures SmoothGrad's relevance in ongoing AI research.

PDF Markdown