Integrating Distributed Architectures in Highly Modular RL Libraries (2007.02622v3)
Abstract: Advancing reinforcement learning (RL) requires tools that are flexible enough to easily prototype new methods while avoiding impractically slow experimental turnaround times. To match the first requirement, the most popular RL libraries advocate for highly modular agent composability, which facilitates experimentation and development. To solve challenging environments within reasonable time frames, scaling RL to large sampling and computing resources has proved a successful strategy. However, this capability has been so far difficult to combine with modularity. In this work, we explore design choices to allow agent composability both at a local and distributed level of execution. We propose a versatile approach that allows the definition of RL agents at different scales through independent reusable components. We demonstrate experimentally that our design choices allow us to reproduce classical benchmarks, explore multiple distributed architectures, and solve novel and complex environments while giving full control to the user in the agent definition and training scheme definition. We believe this work can provide useful insights to the next generation of RL libraries.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- OpenAI. Openai five. https://blog.openai.com/openai-five/, 2018.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Pybullet, a python module for physics simulation in robotics, games and machine learning, 2017.
- The animal-ai olympics. Nature Machine Intelligence, 1(5):257–257, 2019.
- Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378, 2019.
- The minerl competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:1904.10079, 2019.
- Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9339–9347, 2019.
- Reinvent 2.0: an ai tool for de novo drug design. Journal of chemical information and modeling, 60(12):5918–5922, 2020.
- Controlling commercial cooling systems using reinforcement learning. arXiv preprint arXiv:2211.07357, 2022.
- A graph placement methodology for fast chip design. Nature, 594(7862):207–212, 2021.
- Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL http://jmlr.org/papers/v23/21-1127.html.
- Rllib: Abstractions for distributed reinforcement learning. arXiv preprint arXiv:1712.09381, 2017.
- Vincent Moens. TorchRL: an open-source Reinforcement Learning (RL) library for PyTorch, 2023. URL https://github.com/pytorch/rl.
- Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561, 2018.
- Recurrent experience replay in distributed reinforcement learning. 2018.
- Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
- Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv, pages arXiv–1911, 2019.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024–8035, 2019.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- DI engine Contributors. DI-engine: OpenDILab decision intelligence engine. https://github.com/opendilab/DI-engine, 2021.
- Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022. URL http://jmlr.org/papers/v23/21-1342.html.
- The garage contributors. Garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage, 2019.
- OpenSpiel: A framework for reinforcement learning in games. CoRR, abs/1908.09453, 2019. URL http://arxiv.org/abs/1908.09453.
- Dopamine: A research framework for deep reinforcement learning. arXiv preprint arXiv:1812.06110, 2018.
- TF-Agents: A library for reinforcement learning in tensorflow. https://github.com/tensorflow/agents, 2018. URL https://github.com/tensorflow/agents. [Online; accessed 25-June-2019].
- Tensorforce: a tensorflow library for applied reinforcement learning. Web page, 2017. URL https://github.com/tensorforce/tensorforce.
- Matthias Plappert. keras-rl. https://github.com/keras-rl/keras-rl, 2016.
- Benchmarking deep reinforcement learning for continuous control, 2016.
- Reinforcement learning coach, December 2017. URL https://doi.org/10.5281/zenodo.1134899.
- Mushroomrl: Simplifying reinforcement learning research. Journal of Machine Learning Research, 22(131):1–5, 2021. URL http://jmlr.org/papers/v22/18-056.html.
- Openai baselines. https://github.com/openai/baselines, 2017.
- RLgraph: Modular Computation Graphs for Deep Reinforcement Learning. In Proceedings of the 2nd Conference on Systems and Machine Learning (SysML), April 2019.
- Ray: A distributed framework for emerging {{\{{AI}}\}} applications. In 13th {normal-{\{{USENIX}normal-}\}} Symposium on Operating Systems Design and Implementation ({normal-{\{{OSDI}normal-}\}} 18), pages 561–577, 2018.
- Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020.
- Launchpad: A programming model for distributed machine learning research. arXiv preprint arXiv:2106.04516, 2021. URL https://arxiv.org/abs/2106.04516.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
- Prioritized experience replay, 2016.
- Hindsight experience replay, 2018.
- Boosting soft actor-critic: Emphasizing recent experience without forgetting the past, 2019.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Rainbow: Combining improvements in deep reinforcement learning, 2017.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596. PMLR, 2018.
- Soft actor-critic algorithms and applications. ArXiv, abs/1812.05905, 2018.
- Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920, 2018.
- Openai gym, 2016.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- Trifinger: An open-source robot for learning dexterity. arXiv preprint arXiv:2008.03596, 2020.
- Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811, 2018.
- Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933, 2018.
- Facebook Incubator. Submitit. https://github.com/facebookincubator/submitit, 2021. [GitHub repository].
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Obstacle tower video demo. YouTube video. URL https://youtu.be/L442rrVnDr4. Accessed on May 31, 2023.
- Fixup initialization: Residual learning without normalization. arXiv preprint arXiv:1901.09321, 2019.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Albert Bou (8 papers)
- Sebastian Dittert (4 papers)
- Gianni De Fabritiis (39 papers)