Explaining Image Classifiers by Counterfactual Generation (1807.08024v3)

Published 20 Jul 2018 in cs.CV

Abstract: When an image classifier makes a prediction, which parts of the image are relevant and why? We can rephrase this question to ask: which parts of the image, if they were not seen by the classifier, would most change its decision? Producing an answer requires marginalizing over images that could have been seen but weren't. We can sample plausible image in-fills by conditioning a generative model on the rest of the image. We then optimize to find the image regions that most change the classifier's decision after in-fill. Our approach contrasts with ad-hoc in-filling approaches, such as blurring or injecting noise, which generate inputs far from the data distribution, and ignore informative relationships between different parts of the image. Our method produces more compact and relevant saliency maps, with fewer artifacts compared to previous methods.

Citations (256)

View on Semantic Scholar

Summary

The paper introduces a counterfactual generative method that produces data-consistent saliency maps, outperforming traditional perturbation techniques.
It leverages a model-agnostic approach with variational Bernoulli dropout and conditioned generative infilling to preserve contextual image data.
Experiments demonstrate higher target probabilities and better alignment with human-labeled regions, enhancing overall classifier interpretability.

Overview of "Explaining Image Classifiers by Counterfactual Generation"

This paper presents a novel approach for explaining the predictions made by image classifiers through the generation of counterfactuals. Traditional saliency map methods often suffer from limitations such as producing artifacts and ignoring contextual pixel information within an image. The authors address these limitations by developing a generative framework that generates more plausible and data-consistent saliency maps by leveraging strong generative models.

Methodology

The paper introduces a counterfactual generation approach where the core question is redefined as: which image regions, when replaced by plausible alternative values, would cause the greatest change in the classifier's output? Unlike traditional approaches that use ad-hoc methods like blurring or random noise to infill masked image regions, this framework infills those regions using conditioned generative models, thus preserving the data distribution and contextual relations within an image. The framework employs a model-agnostic method based on variational Bernoulli dropout to compute feature importance for any differentiable classifier, sampling counterfactual inputs through conditioned generative models.

The process involves masking image regions and infilling them using a conditional generative model trained to respect the original training data distribution. This method effectively marginalizes the masked areas by conditioning their generation on the information retained in the visible parts of the image. The saliency maps produced are thus more compact and feature fewer artifacts.

Numerical Results and Claims

The authors provide strong empirical evidence supporting the effectiveness of this approach. They quantitatively demonstrate that their counterfactual generation method, specifically using Contextual Attention GAN (CA), outperforms other heuristics like mean pixel value, Gaussian blur, and random noise in the context of maintaining classifier confidence. For instance, their method achieved better coherence with ImageNet data distributions, resulting in higher target probabilities when tested against randomly masked pixels.

Another noteworthy claim is that the proposed method produces more focused saliency maps, evidenced by a reduction in the classifier's output confidence level with fewer manipulated pixels compared to established methods. Extensive experiments reinforce that the generated saliency maps align more closely with human-labeled bounding boxes in weakly supervised localization tasks and possess higher Saliency Metrics than competitors.

Implications and Future Work

This method has significant implications for interpretability in machine learning, particularly within computer vision, where understanding model decisions is often opaque. By focusing on likely counterfactuals rather than arbitrary perturbations, this method aligns better with human understanding and intuition of relevance within visual data.

Looking forward, improvements in generative modeling, such as advancements in GAN architectures, could enhance this framework further by improving the quality and diversity of counterfactual infillings. The integration or comparison of this method with complementary interpretability approaches could also offer holistic insights into understanding deep neural network behaviors. The framework’s adaptability to different classifiers suggests its potential for broad applications across various domains that require high levels of interpretability.

Conclusion

The paper by Chang et al. introduces an innovative, generative approach to computing saliency maps that emphasize realistic counterfactuals, thereby offering more coherent and reliable explanations of image classifier decisions. Through methodological advancements and empirical validation, this work provides a tangible pathway to enhancing the transparency of AI systems in image classification and beyond, promising significant advancements in the field of AI interpretability.

PDF Markdown

Related Papers

GitHub

GitHub - zzzace2000/FIDO-saliency: Explaining Image Classifiers by Counterfactual Generation (28 stars)