Entropy-based Logic Explanations of Neural Networks (2106.06804v4)

Published 12 Jun 2021 in cs.AI, cs.CV, cs.LG, and cs.LO

Abstract: Explainable artificial intelligence has rapidly emerged since lawmakers have started requiring interpretable models for safety-critical domains. Concept-based neural networks have arisen as explainable-by-design methods as they leverage human-understandable symbols (i.e. concepts) to predict class memberships. However, most of these approaches focus on the identification of the most relevant concepts but do not provide concise, formal explanations of how such concepts are leveraged by the classifier to make predictions. In this paper, we propose a novel end-to-end differentiable approach enabling the extraction of logic explanations from neural networks using the formalism of First-Order Logic. The method relies on an entropy-based criterion which automatically identifies the most relevant concepts. We consider four different case studies to demonstrate that: (i) this entropy-based criterion enables the distillation of concise logic explanations in safety-critical domains from clinical data to computer vision; (ii) the proposed approach outperforms state-of-the-art white-box models in terms of classification accuracy and matches black box performances.

Citations (65)

View on Semantic Scholar

Summary

The paper introduces an entropy-based technique that uses first-order logic to derive formal explanations from neural networks, bridging transparency and performance.
It leverages an entropy-minimization criterion integrated with truth table derivation to regulate and extract relevant high-level concepts from data.
Experiments on diverse datasets, including medical and image recognition, demonstrate improved classification accuracy and enhanced explanation quality over traditional methods.

Entropy-based Logic Explanations of Neural Networks

The paper "Entropy-based Logic Explanations of Neural Networks" introduces a novel approach for deriving formal explanations from neural networks using First-Order Logic (FOL). The proposed method leverages an entropy-based criterion to identify relevant concepts in data, enabling the extraction of concise logic explanations, and aims to bridge the gap between explainability and the high performance typically associated with black-box models.

Introduction

The lack of transparency in neural networks poses challenges for their application in critical domains, where explainability is necessary for trust and compliance with regulations. Concept-based neural networks offer a solution by using high-level human-understandable concepts for predictions, yet many existing methods focus only on identifying relevant concepts without detailing how they contribute to classification.

Methodology

The paper introduces an entropy-based mechanism designed to distill logic explanations from neural networks:

Entropy-based layer: This layer computes a truth table representing the decision logic of the network. The technique involves regulating the relevancy of concepts using entropy minimization, hence fostering the emergence of simple logic explanations.
Loss Function: The model is trained using a combined loss of standard supervised training loss and entropy, where lowering entropy encourages simpler explanations.

Logic Explanations

The logic explanations are formulated using:

Truth tables: These tables summarize the network's behavior in terms of activation patterns of input concepts.
First-Order Logic (FOL) formulas: FOL is derived from truth tables, allowing for both individual and class-level explanations.

Experimental Validation

The paper showcases the performance of the proposed approach through various experiments encompassing medical datasets (MIMIC-II), socio-political datasets (V-Dem), and image recognition problems (MNIST, CUB-200):

Classification Accuracy: The entropy-based network consistently performs as well as or better than other white-box models, meaning it can potentially replace them in scenarios requiring transparency.
Explanation Quality: The model provides accurate and logically concise explanations, demonstrating non-dominated solutions in terms of complexity and test error, aligning well with human cognitive biases towards simpler explanations.
Efficiency: The entropy-based approach offers a favorable trade-off between training time and explanation quality compared to other rule extraction methods.

Conclusion

This research signifies a step toward integrating high-performance neural networks in domains governed by strict explainability requirements. The entropy-based mechanism facilitates deriving formal logic explanations that can potentially aid the scientific investigation of complex patterns modeled by neural networks. Future developments could focus on enhancing the automatic generation of interpretable concepts from raw data, further reducing the manual annotation burden and increasing applicability in varied real-world contexts.

PDF Markdown

Related Papers

YouTube

Show All Videos