Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
9 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Disparities in Post Hoc Machine Learning Explanation (2401.14539v1)

Published 25 Jan 2024 in cs.LG, cs.CY, and stat.ML

Abstract: Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Sanity checks for saliency maps. Advances in neural information processing systems 31 (2018).
  2. Post hoc explanations may be ineffective for detecting unknown spurious correlation. arXiv preprint arXiv:2212.04629 (2022).
  3. Ibrahim Adeshola and Adeola Praise Adepoju. 2023. The opportunities and challenges of ChatGPT in education. Interactive Learning Environments (2023), 1–14.
  4. Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 559–560.
  5. How does the model make predictions? A systematic literature review on the explainability power of machine learning in healthcare. Artificial Intelligence in Medicine 143 (2023), 102616.
  6. The road to explainability is paved with bias: Measuring the fairness of explanations. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1194–1206.
  7. Fairness and Machine Learning: Limitations and Opportunities. MIT Press.
  8. Explainability for fair machine learning. arXiv preprint arXiv:2010.07389 (2020).
  9. Jacob Bien and Robert Tibshirani. 2009. Classification by set cover: The prototype vector machine. arXiv preprint arXiv:0908.2284 (2009).
  10. Nadia Burkart and Marco F Huber. 2021. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 70 (2021), 245–317.
  11. Explainable machine learning in credit risk management. Computational Economics 57 (2021), 203–216.
  12. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 1721–1730.
  13. Ethical machine learning in healthcare. Annual review of biomedical data science 4 (2021), 123–144.
  14. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nature biomedical engineering 7, 6 (2023), 719–742.
  15. Silvia Chiappa. 2019. Path-specific counterfactual fairness. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 7801–7808.
  16. Mark Craven and Jude Shavlik. 1995. Extracting tree-structured representations of trained networks. Advances in neural information processing systems 8 (1995).
  17. Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 203–214.
  18. What will it take to generate fairness-preserving explanations? arXiv preprint arXiv:2106.13346 (2021).
  19. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144, 1 (2015), 114.
  20. Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, 0210–0215.
  21. Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science advances 4, 1 (2018), eaao5580.
  22. UCI machine learning repository. (2017).
  23. Algorithmic fairness datasets: the story so far. Data Mining and Knowledge Discovery 36, 6 (2022), 2074–2152.
  24. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.
  25. A review of challenges and opportunities in machine learning for health. AMIA Summits on Translational Science Proceedings 2020 (2020), 191.
  26. Peter Hase and Mohit Bansal. 2020. Evaluating explainable AI: Which algorithmic explanations help users predict model behavior? arXiv preprint arXiv:2005.01831 (2020).
  27. Nathan Kallus and Angela Zhou. 2018. Residual unfairness in fair machine learning from prejudiced data. In International Conference on Machine Learning. PMLR, 2439–2448.
  28. Fairness-aware classifier with prejudice remover regularizer. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23. Springer, 35–50.
  29. Model-agnostic counterfactual explanations for consequential decisions. In International Conference on Artificial Intelligence and Statistics. PMLR, 895–905.
  30. Racial underrepresentation in dermatological datasets leads to biased machine learning models and inequitable healthcare. Journal of biomed research 3, 1 (2022), 42.
  31. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675–1684.
  32. The dangers of post-hoc interpretability: Unjustified counterfactual explanations. arXiv preprint arXiv:1907.09294 (2019).
  33. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. (2015).
  34. Moshe Lichman et al. 2013. UCI machine learning repository.
  35. Explainable ai: A review of machine learning interpretability methods. Entropy 23, 1 (2020), 18.
  36. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning. PMLR, 6781–6792.
  37. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
  38. Fairness and missing values. arXiv preprint arXiv:1905.12728 (2019).
  39. Vishwali Mhasawade and Rumi Chunara. 2021. Causal multi-level fairness. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 784–794.
  40. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.
  41. A unifying view on dataset shift in classification. Pattern recognition 45, 1 (2012), 521–530.
  42. Diagnosing Model Performance Under Distribution Shift. arXiv preprint arXiv:2303.02011 (2023).
  43. Dana Pessach and Erez Shmueli. 2023. Algorithmic fairness. In Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook. Springer, 867–886.
  44. Understanding subgroup performance differences of fair predictors using causal models. In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models.
  45. Model agnostic supervised local explanations. Advances in neural information processing systems 31 (2018).
  46. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
  47. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  48. Towards Unraveling Calibration Biases in Medical Image Analysis. In Workshop on Clinical Image-Based Procedures. Springer, 132–141.
  49. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence 1, 5 (2019), 206–215.
  50. Guided-LIME: Structured Sampling based Hybrid Approach towards Explaining Blackbox Machine Learning Models.. In CIKM (Workshops).
  51. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.
  52. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 3–13.
  53. Reliable post hoc explanations: Modeling uncertainty in explainability. Advances in neural information processing systems 34 (2021), 9391–9404.
  54. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
  55. Adarsh Subbaswamy and Suchi Saria. 2018. Counterfactual Normalization: Proactively Addressing Dataset Shift Using Causal Mechanisms.. In UAI. 947–957.
  56. Adarsh Subbaswamy and Suchi Saria. 2020. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 21, 2 (2020), 345–352.
  57. Distill-and-compare: Auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 303–310.
  58. Automated machine learning in practice: state of the art and recent results. In 2019 6th Swiss Conference on Data Science (SDS). IEEE, 31–36.
  59. Developing a fidelity evaluation approach for interpretable machine learning. arXiv preprint arXiv:2106.08492 (2021).
  60. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. , 874–882 pages.
  61. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31 (2017), 841.
  62. Fulton Wang and Cynthia Rudin. 2015. Falling rule lists. In Artificial intelligence and statistics. PMLR, 1013–1022.
  63. Yanchen Wang and Lisa Singh. 2021. Analyzing the impact of missing values and selection bias on fairness. International Journal of Data Science and Analytics 12, 2 (2021), 101–119.
  64. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
  65. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015).
  66. Fairness constraints: Mechanisms for fair classification. In Artificial intelligence and statistics. PMLR, 962–970.
  67. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society Series A: Statistics in Society 180, 3 (2017), 689–722.
  68. A machine learning approach to graduate admissions and the role of letters of recommendation. Plos one 18, 10 (2023), e0291107.
  69. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 10, 5 (2021), 593.
  70. S-lime: Stabilized-lime for model explanation. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2429–2438.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com