Cause and Effect: Hierarchical Concept-based Explanation of Neural Networks (2105.07033v2)
Abstract: In many scenarios, human decisions are explained based on some high-level concepts. In this work, we take a step in the interpretability of neural networks by examining their internal representation or neuron's activations against concepts. A concept is characterized by a set of samples that have specific features in common. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes. While the previous methods focus on the importance of a concept to a task class, we go further and introduce four measures to quantitatively determine the order of causality. Moreover, we propose a method for constructing a hierarchy of concepts in the form of a concept-based decision tree which can shed light on how various concepts interact inside a neural network towards predicting output classes. Through experiments, we demonstrate the effectiveness of the proposed method in explaining the causal relationship between a concept and the predictive behaviour of a neural network as well as determining the interactions between different concepts through constructing a concept hierarchy.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.