On quantitative aspects of model interpretability (2007.07584v1)

Published 15 Jul 2020 in cs.LG and stat.ML

Abstract: Despite the growing body of work in interpretable machine learning, it remains unclear how to evaluate different explainability methods without resorting to qualitative assessment and user-studies. While interpretability is an inherently subjective matter, previous works in cognitive science and epistemology have shown that good explanations do possess aspects that can be objectively judged apart from fidelity), such assimplicity and broadness. In this paper we propose a set of metrics to programmatically evaluate interpretability methods along these dimensions. In particular, we argue that the performance of methods along these dimensions can be orthogonally imputed to two conceptual parts, namely the feature extractor and the actual explainability method. We experimentally validate our metrics on different benchmark tasks and show how they can be used to guide a practitioner in the selection of the most appropriate method for the task at hand.

Citations (105)

View on Semantic Scholar

Summary

The paper proposes quantifiable metrics that assess interpretability based on simplicity, broadness, and fidelity.
It separates model interpretation into feature extraction and explainability methods for precise comparative analysis.
Experimental results on tasks like image classification and decision trees demonstrate actionable insights for enhancing model transparency.

Quantitative Aspects of Model Interpretability

The paper "On Quantitative Aspects of Model Interpretability" addresses the challenge of evaluating interpretability methods in machine learning without relying solely on qualitative assessments such as user-studies. It explores the possibility of using objective metrics, grounded in cognitive science and epistemology, to evaluate model interpretability. This paper presents a set of functionally-grounded metrics to quantify interpretability along dimensions of simplicity, broadness, and fidelity.

Introduction and Motivation

The growing demand for explainability in machine learning, driven by ethical, legal, and practical considerations, has led to a proliferation of interpretability methods. However, the lack of standardized metrics for objective evaluation makes it difficult to compare these methods or select an appropriate one for a given application. Drawing from insights in cognitive science, which suggest that good explanations are often simple and broadly applicable, the authors propose quantitative dimensions along which interpretability can be assessed.

Key Concepts and Proposed Metrics

The paper emphasizes the separation of interpretability methods into two components: the feature extractor and the explainability method. This separation allows for a more precise analysis of the interpretability and provides a framework for developing evaluative metrics. The proposed dimensions for quantitative assessment include:

Simplicity: Assessed by the effort required to understand the explanation.
Broadness: The applicability of an explanation across different contexts.
Fidelity: The degree to which an explanation accurately represents the model.

The authors introduce metrics for different interpretability modalities, including feature extraction, example-based methods, and feature attribution methods. Specifically, mutual information is used to assess the trade-offs between simplicity, broadness, and fidelity in feature extraction. For example-based methods, metrics such as non-representativeness and diversity are proposed. For feature attribution methods, metrics evaluating monotonicity, non-sensitivity, and effective complexity are discussed.

Experimental Validation

The paper provides extensive empirical validation of the proposed metrics using a variety of benchmark tasks. For example, it evaluates the effect of different feature extractors on the interpretability of LIME used to explain a decision tree classifier, highlighting how different feature representations impact the fidelity and simplicity of explanations. Example-based explanations are assessed using image classification with CNNs, highlighting the trade-offs between representativeness and diversity in prototypical examples. Furthermore, it examines feature attribution methods for their ability to accurately reflect underlying model behavior and illustrate the effectiveness of the proposed metrics in identifying the strengths and limitations of various interpretability methods.

Implications and Future Directions

The introduction of these metrics has significant implications for both the theory and practice of interpretability in AI. Practitioners can use these metrics to guide the selection of interpretability methods that balance simplicity, broadness, and fidelity based on application and user needs. On a theoretical level, the metrics provide a framework for advancing our understanding of interpretability, potentially enabling more rigorous scientific discourse and comparative studies across different methods.

Conclusion

The paper concludes by reaffirming the necessity of quantifiable metrics to complement qualitative assessments in the field of interpretable machine learning. By proposing these metrics, the authors contribute to a more systematic and scientific approach to understanding and improving model interpretability, paving the way for future developments that could enhance transparency, trust, and accountability in AI systems.

PDF Markdown

Related Papers

YouTube

Show All Videos