Highway Graph to Accelerate Reinforcement Learning (2405.11727v2)
Abstract: Reinforcement Learning (RL) algorithms often struggle with low training efficiency. A common approach to address this challenge is integrating model-based planning algorithms, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. However, VI requires iterating over a large tensor which updates the value of the preceding state based on the succeeding state through value propagation, resulting in computationally intensive operations. To enhance the RL training efficiency, we propose improving the efficiency of the value learning process. In deterministic environments with discrete state and action spaces, we observe that on the sampled empirical state-transition graph, a non-branching sequence of transitions-termed a highway-can take the agent to another state without deviation through intermediate states. On these non-branching highways, the value-updating process can be streamlined into a single-step operation, eliminating the need for step-by-step updates. Building on this observation, we introduce the highway graph to model state transitions. The highway graph compresses the transition model into a compact representation, where edges can encapsulate multiple state transitions, enabling value propagation across multiple time steps in a single iteration. By integrating the highway graph into RL, the training process is significantly accelerated, particularly in the early stages of training. Experiments across four categories of environments demonstrate that our method learns significantly faster than established and state-of-the-art RL algorithms (often by a factor of 10 to 150) while maintaining equal or superior expected returns. Furthermore, a deep neural network-based agent trained using the highway graph exhibits improved generalization capabilities and reduced storage costs. Code is publicly available at https://github.com/coodest/highwayRL.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
- Correcting discount-factor mismatch in on-policy policy gradient methods. In International Conference on Machine Learning, volume 202, pp. 4218–4240. PMLR, 2023.
- IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning, volume 80, pp. 1406–1415. PMLR, 2018.
- SEED RL: scalable and efficient deep-rl with accelerated central inference. In International Conference on Learning Representations, 2020.
- Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies. In International Conference on Machine Learning, pp. 9827–9869. PMLR, 2023.
- Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, volume 9, pp. 249–256, 2010.
- Generalizable episodic memory for deep reinforcement learning. In International Conference on Machine Learning, volume 139, pp. 4380–4390. PMLR, 2021.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
- Model based reinforcement learning for atari. In International Conference on Learning Representations, 2020.
- Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations, 2019.
- An introduction to metric spaces and fixed point theory. John Wiley & Sons, 2011.
- Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 4501–4510, 2020.
- Rllib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning, volume 80, pp. 3059–3068. PMLR, 2018.
- Episodic memory deep q-networks. In International Joint Conference on Artificial Intelligence, pp. 2433–2439, 2018.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, volume 48, pp. 1928–1937. PMLR, 2016.
- Neural episodic control. In International Conference on Machine Learning, volume 70, pp. 2827–2836. PMLR, 2017a.
- Neural episodic control. In International conference on machine learning, pp. 2827–2836. PMLR, 2017b.
- Prioritized experience replay. In International Conference on Learning Representations, 2016.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Reinforcement learning: An introduction. MIT press, 2018.
- Monte carlo tree search: a review of recent modifications and applications. Artificial Intelligence Review, 56(3):2497–2562, 2023.
- Value iteration networks. Advances in Neural Information Processing Systems, 29:2146–2154, 2016.
- Gymnasium, 2023. URL https://zenodo.org/record/8127025.
- Efficient approximate value iteration for continuous gaussian pomdps. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 26, pp. 1832–1838, 2012.
- Episodic reinforcement learning with associative memory. In International Conference on Learning Representations, 2020.