From Feature Importance to Natural Language Explanations Using LLMs with RAG (2407.20990v1)
Abstract: As machine learning becomes increasingly integral to autonomous decision-making processes involving human interaction, the necessity of comprehending the model's outputs through conversational means increases. Most recently, foundation models are being explored for their potential as post hoc explainers, providing a pathway to elucidate the decision-making mechanisms of predictive models. In this work, we introduce traceable question-answering, leveraging an external knowledge repository to inform the responses of LLMs to user queries within a scene understanding task. This knowledge repository comprises contextual details regarding the model's output, containing high-level features, feature importance, and alternative probabilities. We employ subtractive counterfactual reasoning to compute feature importance, a method that entails analysing output variations resulting from decomposing semantic features. Furthermore, to maintain a seamless conversational flow, we integrate four key characteristics - social, causal, selective, and contrastive - drawn from social science research on human explanations into a single-shot prompt, guiding the response generation process. Our evaluation demonstrates that explanations generated by the LLMs encompassed these elements, indicating its potential to bridge the gap between complex model outputs and natural language expressions.
- Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai, Information fusion 58 (2020) 82–115.
- Trustworthy artificial intelligence: a review, ACM computing surveys (CSUR) 55 (2022) 1–38.
- I. Varošanec, On the path to the future: mapping the notion of transparency in the eu regulatory framework for ai, International Review of Law, Computers & Technology 36 (2022) 95–117.
- Explainability in ai policies: a critical review of communications, reports, regulations, and standards in the eu, us, and uk, in: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, pp. 1198–1212.
- F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable machine learning, arXiv preprint arXiv:1702.08608 (2017).
- Interpreting black-box models: a review on explainable artificial intelligence, Cognitive Computation 16 (2024) 45–74.
- Explainable artificial intelligence (xai) post-hoc explainability methods: Risks and limitations in non-discrimination law, AI and Ethics 2 (2022) 815–826.
- Reliable post hoc explanations: Modeling uncertainty in explainability, Advances in neural information processing systems 34 (2021) 9391–9404.
- Post-hoc interpretability for neural nlp: A survey, ACM Computing Surveys 55 (2022) 1–42.
- Explainable ai: A review of machine learning interpretability methods, Entropy 23 (2020) 18.
- A survey on xai and natural language explanations, Information Processing & Management 60 (2023) 103111.
- Driving with llms: Fusing object-level vector modality for explainable autonomous driving, arXiv preprint arXiv:2310.01957 (2023).
- A survey on multimodal large language models for autonomous driving, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 958–979.
- M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, Springer, 2014, pp. 818–833.
- Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618–626.
- Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
- A survey on large language models: Applications, challenges, limitations, and practical usage, Authorea Preprints (2023).
- Are large language models post hoc explainers?, arXiv preprint arXiv:2310.05797 (2023).
- Fine-tuning large language model based explainable recommendation with explainable quality reward, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 2024, pp. 9250–9259.
- T. Miller, Explanation in artificial intelligence: Insights from the social sciences, Artificial intelligence 267 (2019) 1–38.
- D. J. Hilton, Conversational processes and causal explanation., Psychological Bulletin 107 (1990) 65.
- A. Cawsey, Planning interactive explanations, International Journal of Man-Machine Studies 38 (1993) 169–199.
- H. P. Grice, Logic and conversation, in: Speech acts, Brill, 1975, pp. 41–58.
- A taxonomy of social cues for conversational agents, International Journal of Human-Computer Studies 132 (2019) 138–161.
- I. A. Apperly, S. A. Butterfill, Do humans have two systems to track beliefs and belief-like states?, Psychological review 116 (2009) 953.
- R. R. Hoffman, G. Klein, Explaining explanation, part 1: theoretical foundations, IEEE Intelligent Systems 32 (2017) 68–73.
- Machine learning interpretability: A survey on methods and metrics, Electronics 8 (2019) 832.
- D. Hilton, Social attribution and explanation, Oxford Academic (2017).
- P. Lipton, Contrastive explanation, Royal Institute of Philosophy Supplements 27 (1990) 247–266.
- K. Epstude, N. J. Roese, The functional theory of counterfactual thinking, Personality and social psychology review 12 (2008) 168–192.
- Context-based image explanations for deep neural networks, Image and Vision Computing 116 (2021) 104310.
- Perturbation-based methods for explaining deep neural networks: A survey, Pattern Recognition Letters 150 (2021) 228–234.
- M. Robnik-Šikonja, M. Bohanec, Perturbation-based explanations of prediction models, Human and Machine Learning: Visible, Explainable, Trustworthy and Transparent (2018) 159–175.
- Explaining by removing: A unified framework for model explanation, Journal of Machine Learning Research 22 (2021) 1–90.
- From spoken thoughts to automated driving commentary: Predicting and explaining intelligent vehicles’ actions, in: 2022 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2022, pp. 1040–1047.
- Visual explanations for dnns with contextual importance, in: Explainable and Transparent AI and Multi-Agent Systems: Third International Workshop, EXTRAAMAS 2021, Virtual Event, May 3–7, 2021, Revised Selected Papers 3, Springer, 2021, pp. 83–96.
- Retrieval-augmented generation for large language models: A survey, arXiv preprint arXiv:2312.10997 (2023).
- Real-time object detection and semantic segmentation for autonomous driving, in: MIPPR 2017: Automatic Target Recognition and Navigation, volume 10608, SPIE, 2018, pp. 167–174.
- Modeling and assessing an intelligent system for safety in human-robot collaboration using deep and machine learning techniques, Multimedia Tools and Applications (2021) 1–27.
- Hierarchical vertex regression-based segmentation of head and neck ct images for radiotherapy planning, IEEE Transactions on Image Processing 27 (2017) 923–937.
- Places: A 10 million image database for scene recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
- Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- A brief survey on semantic segmentation with deep learning, Neurocomputing 406 (2020) 302–321.
- A survey on semantic segmentation, in: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, 2018, pp. 1233–1240.