DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces (2306.15913v1)
Abstract: The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We describe an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. We use the trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to outperform two competitive baselines in two diverse environments: a 2D maze environment with more than 4000 discrete noisy actions, and a product recommendation task that uses real-world e-commerce transaction data. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.
- Reinforcement learning based recommender systems: A survey. Comput. Surveys 55, 7 (2022), 1–38.
- Reinforcement learning based recommender systems: A survey. CoRR abs/2101.06286 (2021). arXiv:2101.06286 https://arxiv.org/abs/2101.06286
- Action redundancy in reinforcement learning. In Uncertainty in Artificial Intelligence. PMLR, 376–385.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
- Learning action representations for reinforcement learning. In International conference on machine learning. PMLR, 941–950.
- Learning action-transferable policy with action embedding. arXiv preprint arXiv:1909.02291 (2019).
- Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv: Artificial Intelligence (2015).
- Hybrid actor-critic reinforcement learning in parameterized action space. arXiv preprint arXiv:1903.01344 (2019).
- Simple Emergent Action Representations from Multi-Task Policy Training. arXiv preprint arXiv:2210.09566 (2022).
- Generalization to New Actions in Reinforcement Learning. In International Conference on Machine Learning. PMLR, 4661–4672.
- Reinforcement Learning to Run… Fast. In The NIPS ’17 Competition: Building Intelligent Systems, Sergio Escalera and Markus Weimer (Eds.). 155–167.
- A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059 (2017).
- Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. In The NIPS’17 Competition: Building Intelligent Systems. Springer, 121–153.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 11 (2013), 1238–1274.
- Controlling assistive robots with learned latent actions. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 378–384.
- Resource management with deep reinforcement learning. (2016), 50.
- Hardik Meisheri and Harshad Khadilkar. 2021. FoLaR: Foggy Latent Representations for Reinforcement Learning with Partial Observability. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- Scalable multi-product inventory control with lead time constraints using reinforcement learning. Neural Computing and Applications 34, 3 (2022), 1735–1757.
- Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
- Joint State-Action Embedding for Efficient Reinforcement Learning. CoRR abs/2010.04444 (2020). arXiv:2010.04444 https://arxiv.org/abs/2010.04444
- Deterministic policy gradient algorithms. In International conference on machine learning. PMLR, 387–395.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- A reinforcement learning approach to personalized learning recommendation systems. Brit. J. Math. Statist. Psych. 72, 1 (2019), 108–135.
- Guy Tennenholtz and Shie Mannor. 2019. The natural language of actions. In International Conference on Machine Learning. PMLR, 6196–6205.
- AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/.
- Dynamics-aware embeddings. arXiv preprint arXiv:1908.09357 (2019).
- Learn what not to learn: Action elimination with deep reinforcement learning. Advances in Neural Information Processing Systems 31 (2018).