Class-Discriminative Attention Maps for Vision Transformers (2312.02364v3)

Published 4 Dec 2023 in cs.CV, cs.AI, cs.LG, and stat.ML

Abstract: Importance estimators are explainability methods that quantify feature importance for deep neural networks (DNN). In vision transformers (ViT), the self-attention mechanism naturally leads to attention maps, which are sometimes interpreted as importance scores that indicate which input features ViT models are focusing on. However, attention maps do not account for signals from downstream tasks. To generate explanations that are sensitive to downstream tasks, we have developed class-discriminative attention maps (CDAM), a gradient-based extension that estimates feature importance with respect to a known class or a latent concept. CDAM scales attention scores by how relevant the corresponding tokens are for the predictions of a classifier head. In addition to targeting the supervised classifier, CDAM can explain an arbitrary concept shared by selected samples by measuring similarity in the latent space of ViT. Additionally, we introduce Smooth CDAM and Integrated CDAM, which average a series of CDAMs with slightly altered tokens. Our quantitative benchmarks include correctness, compactness, and class sensitivity, in comparison to 7 other importance estimators. Vanilla, Smooth, and Integrated CDAM excel across all three benchmarks. In particular, our results suggest that existing importance estimators may not provide sufficient class-sensitivity. We demonstrate the utility of CDAM in medical images by training and explaining malignancy and biomarker prediction models based on lung Computed Tomography (CT) scans. Overall, CDAM is shown to be highly class-discriminative and semantically relevant, while providing compact explanations.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces CDAM as a novel approach that integrates class-specific signals into Vision Transformer attention maps to clearly delineate relevant image regions.
It computes gradients on token representations to uncover both supporting and counter-evidence, aligning model decisions with human intuition.
CDAM outperforms traditional relevance propagation and token ablation techniques by delivering sparser, more focused, and semantically consistent visual explanations.

Interpretability in AI is an essential area that helps us understand, trust, and improve machine learning models, particularly deep neural networks (DNNs) like Vision Transformers (ViTs). Vision Transformers, which apply mechanisms initially designed for language processing, have shown impressive results in image recognition tasks. However, while these models can intuitively represent image features, their understanding by humans becomes complex due to the lack of interpretability with respect to specific output classes.

To address this, a new method called class-discriminative attention maps (CDAM) has been introduced. CDAM refines the attention maps used in ViTs by incorporating class-specific signals from a downstream classifier or concept similarity measures. This can reveal which parts of an image are most relevant to the model when making decisions about certain classes or user-defined concepts, which not only enhances the interpretability of ViTs but also provides insights into how different concepts are represented within the model.

CDAM works by computing gradients with respect to token representations in the final layer of the transformer before passing through a classifier. This approach benefits from the existing high-quality object segmentation in attention maps, while also introducing important information about class relevance. For instance, in addition to revealing evidence for a particular class, the method can also show counter-evidence. It is class-discriminative in the sense that it clearly separates targeted objects from the background as well as objects that belong to other classes.

Moreover, the CDAM method offers explanations for broader concepts defined by example images. This concept-based approach does not rely on classifier outputs. Instead, it uses a similarity measure between latent representations of the images and a 'concept vector', allowing assessments of model decisions on concepts it hasn't been explicitly trained to recognize.

In comparison with other methods, such as relevance propagation (RP) and token ablation maps (TAM), CDAM shows strong semantic consistency and class-discrimination. RP is more class-discriminative than regular attention maps (AM), but CDAM and TAM provide clearer distinctions, with CDAM additionally showing implicit regularization and less noise. While RP and TAM are used for comparison, they do not serve as the absolute ground truth for feature relevance, since they provide different perspectives on the decision-making process.

The introduced method is demonstrated to be helpful in providing class-discriminative visualizations that align well with human intuition about the importance of certain image regions for a given class. This is shown by both qualitative visualizations and quantitative correlation assessment. CDAM stands out by not only showing sparser and more focused results than AM but also displaying clearer separations between targeted and non-targeted classes compared to RP.

In conclusion, CDAM represents a significant step forward in the interpretability of Vision Transformers. It can explain both classifier-based predictions and user-defined concepts, offering a versatile tool for understanding the complex representations within self-supervised ViTs. This method enhances the transparency and trust in AI-powered image recognition and offers a promising approach for examining and refining the sophisticated decision-making processes in advanced AI models.

Class-Discriminative Attention Maps for Vision Transformers (2312.02364v3)

Summary

GitHub

Tweets

Class-Discriminative Attention Maps for Vision Transformers (2312.02364v3)

Summary

Related Papers

GitHub

Tweets