The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons (1912.04930v1)

Published 10 Dec 2019 in cs.CY

Abstract: Counterfactual explanations are gaining prominence within technical, legal, and business circles as a way to explain the decisions of a machine learning model. These explanations share a trait with the long-established "principal reason" explanations required by U.S. credit laws: they both explain a decision by highlighting a set of features deemed most relevant--and withholding others. These "feature-highlighting explanations" have several desirable properties: They place no constraints on model complexity, do not require model disclosure, detail what needed to be different to achieve a different decision, and seem to automate compliance with the law. But they are far more complex and subjective than they appear. In this paper, we demonstrate that the utility of feature-highlighting explanations relies on a number of easily overlooked assumptions: that the recommended change in feature values clearly maps to real-world actions, that features can be made commensurate by looking only at the distribution of the training data, that features are only relevant to the decision at hand, and that the underlying model is stable over time, monotonic, and limited to binary outcomes. We then explore several consequences of acknowledging and attempting to address these assumptions, including a paradox in the way that feature-highlighting explanations aim to respect autonomy, the unchecked power that feature-highlighting explanations grant decision makers, and a tension between making these explanations useful and the need to keep the model hidden. While new research suggests several ways that feature-highlighting explanations can work around some of the problems that we identify, the disconnect between features in the model and actions in the real world--and the subjective choices necessary to compensate for this--must be understood before these techniques can be usefully implemented.

Citations (205)

View on Semantic Scholar

Summary

The paper critically examines four hidden assumptions underlying counterfactual explanations and principal reasons in AI, arguing they are often overlooked yet crucial for effectiveness.
Key challenges include mapping feature changes to actionable steps and the arbitrary nature of scaling features based on training data distribution.
The paper highlights implications like the autonomy paradox where explanations can shift control to decision makers and calls for greater disclosure and empirical validation.

Insightful Overview of "The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons"

The paper "The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons," authored by Solon Barocas, Andrew D. Selbst, and Manish Raghavan, critically examines the utility and complexity of feature-highlighting explanations for machine learning models. Through an exploration of counterfactual explanations and principal reasons, the authors investigate the implicit assumptions and practical challenges in deploying these approaches for understanding algorithmic decisions. This essay evaluates the paper's key arguments and discusses the implications and future directions in the domain of explainable artificial intelligence (XAI).

The authors articulate that feature-highlighting explanations, such as counterfactual explanations and principal reasons, are increasingly adopted due to their ability to circumvent disclosing the entire model while providing rationale in a form somewhat compliant with legal standards. Counterfactual explanations identify minimal changes in input features that would alter the model’s decision, while principal reasons, rooted in U.S. credit laws, identify the key factors influencing a decision. Despite their attractiveness, the utility of these explanations hinges on several crucial assumptions regarding their implementation and interpretation.

Central to the paper are four assumptions about feature-highlighting explanations: the clarity of mapping between feature changes and real-world actions, the commensurability of features based on training data distribution, the singular relevance of features to a decision domain, and the stability and monotonicity of the underlying model. The authors argue that these assumptions are often overlooked, yet they are critical to determining the effectiveness of explanations.

Key Challenges and Assumptions

Mapping Changes to Actions: The assumption that modifications in features directly translate to actionable steps ignores the complexity and potential interdependencies among features. Identifying the actions required to achieve specified feature changes can be nontrivial.
Feature Commensurability: Converting feature scales using the distribution from training data can be arbitrary and may not reflect practical realities such as costs or difficulty of changes, thereby impacting the explanation's perceived utility.
Cross-domain Relevance: Features considered relevant for one decision might have implications in other domains of an individual's life. The decision subject's broader context and potential negative spillovers from action recommendations must be considered.
Model Stability and Properties: Real-world models may lack assumed properties of stability and monotonicity. Consequently, a counterfactual’s validity may degrade over time, or the explanation may not guarantee improved outcomes with feature adjustments.

Implications and Normative Tensions

The paper highlights inherent tensions in the translation of theoretical explanations to practical applications. Notably, the autonomy paradox arises when explanations designed to empower decision subjects inadvertently increase data requirements and prioritize decision makers’ interpretations. This leads to a situation where partial disclosure empowers decision makers, thereby shifting control from subjects. The potential conflict between transparency and intellectual property protection, as well as the problem of model gaming, complicate willingness to enhance explanation comprehensiveness.

Implications for Future Research and Policy

The authors call for disclosure of explanation generation methods and suggest exploring legal fiduciary duties to align explanations with decision subjects' best interests. They advocate for empirical research to validate the real-world effectiveness of proposed explanations and underscore the need for interdisciplinary collaboration among computer scientists, legal experts, and social scientists.

This paper critically examines the nuances in implementing counterfactual and principal reason-based explanations in AI systems, underscoring the essentiality of addressing theoretical assumptions within practical contexts. Addressing these assumptions is vital for transitioning from theoretical possibilities to actionable insights beneficial for both decision subjects and makers, thus advancing the field of XAI.

PDF Markdown

Related Papers

Tweets

https://twitter.com/aselbst/status/1775393753269952712