Deep Successor Reinforcement Learning (1606.02396v1)

Published 8 Jun 2016 in stat.ML, cs.AI, cs.LG, and cs.NE

Abstract: Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations -- simple grid-world domains (MazeBase) and the Doom game engine.

Citations (204)

View on Semantic Scholar

Summary

The paper introduces a novel framework that decomposes the value function into a reward predictor and a successor map for efficient deep reinforcement learning.
It demonstrates end-to-end learning from raw pixel inputs using deep neural networks to approximate state representations and reward signals in environments like grid-world and Doom.
The approach facilitates automatic subgoal extraction, improving hierarchical reinforcement learning and enhancing exploration in dynamic reward settings.

Overview of Deep Successor Reinforcement Learning

The paper "Deep Successor Reinforcement Learning" presents an innovative approach to reinforcement learning, specifically targeting the integration of deep learning methods with Successor Representations (SR). SR is introduced as a robust alternative to traditional model-free and model-based reinforcement learning paradigms. The core idea involves decomposing the value function into two distinct components: a reward predictor and a successor map, thereby enabling efficient and flexible value function learning.

The authors introduce Deep Successor Reinforcement Learning (DSR), which extends the SR framework with deep neural networks, allowing end-to-end learning from raw pixel-based observations. This method is demonstrated to be effective in environments like grid-world domains and the Doom game engine, providing empirical validation for its potential applications.

Key Contributions and Methodological Advancements

Successor Representation Framework: At the core of DSR is the decomposition of the value function into a reward predictor and a successor map. The successor map captures the expected future state occupancy, making it sensitive to changes in distal rewards, which enhances versatility over traditional model-free techniques.
End-to-End Deep Learning Integration: DSR incorporates deep neural networks to approximate the SR and the reward functions from raw sensory inputs. It leverages convolutional and linear layers to effectively learn state representations and immediate rewards, driving efficient policy learning.
Automatic Subgoal Extraction: The paper highlights an intriguing use-case of DSR in hierarchical reinforcement learning by extracting bottleneck states, or subgoals, using the learned successor representations. This approach facilitates exploration and could be pivotal in improving hierarchical RL frameworks.
Empirical Validation: DSR's performance is benchmarked against existing models, like DQN, demonstrating competitiveness in achieving goals in both discrete and continuous action spaces. Additionally, the adaptability of DSR to quickly converge under modified reward setups underscores its practical applicability.

Implications and Prospective Directions

The DSR framework opens several avenues for research and practical applications in the field of artificial intelligence and machine learning:

Scalability and Complexity Management: The integration of deep learning with SR addresses scalability challenges in large state spaces. This approach could be critical in developing models that handle more complex environments without exhaustive state-action pair enumerations.
Hierarchical Reinforcement Learning: By enabling subgoal extraction, DSR can significantly contribute to the advancement of hierarchical reinforcement learning methods. Future research could explore the integration of DSR with option discovery frameworks, enhancing exploration efficiency and policy refinement.
Enhanced Reward Sensitivity: The ability of DSR to adapt swiftly to changes in reward structure suggests its utility in dynamic environments where the reward landscape is not static. This property can be leveraged in domains such as robotics and autonomous systems where adaptability is crucial.

Overall, DSR presents a compelling advancement in the reinforcement learning domain by merging deep learning capabilities with an efficient SR-based structure. Future developments in this direction could focus on expanding the generalization of the model to more complex, high-dimensional tasks and integrating more sophisticated mechanisms for intrinsic motivation and exploratory behavior. As such, DSR provides a strong foundation for evolving AI systems that are both robust and adaptable, advancing the frontiers of reinforcement learning strategies.

PDF Markdown

Related Papers

YouTube

Show All Videos