Reconstructing Actions To Explain Deep Reinforcement Learning (2009.08507v3)
Abstract: Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL).We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL. This approach allows us to answer more complex explainability questions than direct application of DNN attribution methods, which we adapt to \emph{behavior-level attributions} in building our action reconstructions. It also allows us to define \emph{agreement}, a metric for quantitatively evaluating the explainability of our methods. Our experiments on a variety of Atari games suggest that perturbation-based attribution methods are significantly more suitable in reconstructing actions to explain the deep RL agent than alternative attribution methods, and show greater \emph{agreement} than existing explainability work utilizing attention. We further show that action reconstruction allows us to demonstrate how a deep agent learns to play Pac-Man game.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.