On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box (2308.09381v3)
Abstract: Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents \methodAbr~(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140.
- The shattered gradients problem: If resnets are the answer, then what is the question? In International Conference on Machine Learning, pages 342–350. PMLR.
- Mirrored sampling and sequential selection for evolution strategies. In Parallel Problem Solving from Nature, PPSN XI: 11th International Conference, Kraków, Poland, September 11-15, 2010, Proceedings, Part I 11, pages 11–21. Springer.
- Unifying orthogonal monte carlo methods. In International Conference on Machine Learning, pages 1203–1212. PMLR.
- When explainability meets adversarial learning: Detecting adversarial examples using shap signatures. In 2020 international joint conference on neural networks (IJCNN), pages 1–8. IEEE.
- Friedman, E. J. (2004). Paths and consistency in additive cost sharing. International Journal of Game Theory, 32:501–518.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673.
- Johnson, H. M. (1911). Clever hans (the horse of mr. von osten): A contribution to experimental, animal, and human psychology. The Journal of Philosophy, Psychology and Scientific Methods, 8(24):663–666.
- Unmasking clever hans predictors and assessing what machines really learn. Nature communications, 10(1):1096.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
- Monte carlo gradient estimation in machine learning. The Journal of Machine Learning Research, 21(1):5183–5244.
- Explaining nonlinear classification decisions with deep taylor decomposition. Pattern recognition, 65:211–222.
- Rise: Randomized input sampling for explanation of black-box models. In Proceeedings of the British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK.
- “Why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252.
- Russell, S. J. (2010). Artificial intelligence a modern approach. Pearson Education, Inc.
- Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 28(11):2660–2673.
- Deep inside convolutional networks: visualising image classification models and saliency maps. In Proceedings of the International Conference on Learning Representations. ICLR.
- Smoothgrad: removing noise by adding noise. In Proceedings of the ICML Workshop on Visualization for Deep Learning, Sydney, Australia, 10 August 2017.
- Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
- Quick shift and kernel methods for mode seeking. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV 10, pages 705–718. Springer.
- Attack-agnostic adversarial detection on medical data using explainable machine learning. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 8180–8187. IEEE.
- Natural evolution strategies. The Journal of Machine Learning Research, 15(1):949–980.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
- Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818–833. Springer.