Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation (2404.03336v3)

Published 4 Apr 2024 in cs.RO

Abstract: In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks like locomotion and dexterous manipulation. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous trials of time-consuming experiments. This work introduces a Population-Based Reinforcement Learning (PBRL) approach that exploits a GPU-accelerated physics simulator to enhance the exploration capabilities of RL by concurrently training multiple policies in parallel. The PBRL framework is applied to three state-of-the-art RL algorithms - PPO, SAC, and DDPG - dynamically adjusting hyperparameters based on the performance of learning agents. The experiments are performed on four challenging tasks in Isaac Gym - Anymal Terrain, Shadow Hand, Humanoid, Franka Nut Pick - by analyzing the effect of population size and mutation mechanisms for hyperparameters. The results demonstrate that PBRL agents outperform non-evolutionary baseline agents across tasks essential for humanoid robots, such as bipedal locomotion, manipulation, and grasping in unstructured environments. The trained agents are finally deployed in the real world for the Franka Nut Pick manipulation task. To our knowledge, this is the first sim-to-real attempt for successfully deploying PBRL agents on real hardware. Code and videos of the learned policies are available on our project website (https://sites.google.com/view/pbrl).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, Dec. 2018.
  2. C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse et al., “Dota 2 with large scale deep reinforcement learning,” arXiv preprint: 1912.06680, 2019.
  3. A. A. Shahid, J. S. V. Sesin, D. Pecioski, F. Braghin, D. Piga, and L. Roveda, “Decentralized multi-agent control of a manipulator in continuous task learning,” Appl. Sci., vol. 11, no. 21, p. Art. no. 10227, Nov. 2021.
  4. Y. Narang, K. Storey, I. Akinola, M. Macklin, P. Reist, L. Wawrzyniak, Y. Guo, A. Moravanszky, G. State, M. Lu, A. Handa, and D. Fox, “Factory: Fast Contact for Robotic Assembly,” in Proc. Robot. Sci. Syst., Jun. 2022.
  5. A. A. Shahid, D. Piga, F. Braghin, and L. Roveda, “Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning,” Auton. Robots, vol. 46, no. 3, pp. 483–498, Feb. 2022.
  6. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” Int. J. Robot. Res., vol. 39, no. 1, pp. 3–20, Jan. 2020.
  7. G. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid Locomotion via Reinforcement Learning,” in Proc. Robot. Sci. Syst., Jun. 2022.
  8. G. Kahn, A. Villaflor, B. Ding, P. Abbeel, and S. Levine, “Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation,” in Proc. IEEE Int. Conf. Robot. Automat., May 2018, pp. 5129–5136.
  9. J. Xu, V. Makoviychuk, Y. Narang, F. Ramos, W. Matusik, A. Garg, and M. Macklin, “Accelerated policy learning with parallel differentiable simulation,” in Proc. Int. Conf. Learn. Represent., Apr. 2022, Art. no. 186704.
  10. B. Tang, M. A. Lin, I. A. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y. S. Narang, “IndustReal: Transferring Contact-Rich Assembly Tasks from Simulation to Reality,” in Proc. Robot. Sci. Syst., Jul. 2023.
  11. I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas et al., “Solving rubik’s cube with a robot hand,” arXiv preprint: 1910.07113, 2019.
  12. T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Sci. Robot., vol. 7, no. 62, Jan. 2022, Art. no. eabk2822.
  13. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint: 2108.10470, 2021.
  14. A. Handa, A. Allshire, V. Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. Van Wyk, A. Zhurkevich, B. Sundaralingam, and Y. Narang, “Dextreme: Transfer of agile in-hand manipulation from simulation to reality,” in Proc. IEEE Int. Conf. Robot. Automat., Jun. 2023, pp. 5977–5984.
  15. J. Liang, V. Makoviychuk, A. Handa, N. Chentanez, M. Macklin, and D. Fox, “GPU-accelerated robotic simulation for distributed reinforcement learning,” in Proc. Conf. Robot Learn., vol. 87, Oct. 2018, pp. 270–282.
  16. A. Allshire, M. MittaI, V. Lodaya, V. Makoviychuk, D. Makoviichuk, F. Widmaier, M. Wüthrich, S. Bauer, A. Handa, and A. Garg, “Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger,” in Proc. IEEE Int. Conf. Intell. Robots Syst., Oct. 2022, pp. 11 802–11 809.
  17. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Proc. Conf. Robot Learn., vol. 164, Nov. 2022, pp. 91–100.
  18. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint: 1707.06347, 2017.
  19. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” arXiv preprint: 1801.01290, 2018.
  20. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint: 1509.02971, 2015.
  21. O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, Nov. 2019.
  22. M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, and T. Graepel, “Human-level performance in 3D multiplayer games with population-based reinforcement learning,” Science, vol. 364, no. 6443, pp. 859–865, May 2019.
  23. A. Flajolet, C. B. Monroc, K. Beguir, and T. Pierrot, “Fast population-based reinforcement learning on a single machine,” in Proc. Int. Conf. Mach. Learn., vol. 162, Jul. 2022, pp. 6533–6547.
  24. E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” in Proc. IEEE Int. Conf. Intell. Robots Syst., Oct. 2012, pp. 5026–5033.
  25. D. Ackley and M. Littman, “Interactions between learning and evolution,” Artif. Life II, vol. 10, pp. 487–509, 1991.
  26. M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan et al., “Population based training of neural networks,” arXiv preprint, 2017. [Online]. Available: https://doi.org/10.48550/arXiv.1711.09846
  27. A. Petrenko, A. Allshire, G. State, A. Handa, and V. Makoviychuk, “DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,” in Proc. Robot. Sci. Syst., Jul. 2023.
  28. C. Chi, B. Burchfiel, E. Cousineau, S. Feng, and S. Song, “Iterative Residual Policy for Goal-Conditioned Dynamic Manipulation of Deformable Objects,” in Proc. Robot. Sci. Syst., Jun. 2022.
  29. Y. Chebotar, A. Handa, V. Makoviychuk, M. Macklin, J. Issac, N. Ratliff, and D. Fox, “Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience,” in Proc. IEEE Int. Conf. Robot. Automat., May 2019, pp. 8973–8979.
  30. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Sci. Robot., vol. 4, no. 26, Jan. 2019.
  31. Z. Li, T. Chen, Z.-W. Hong, A. Ajay, and P. Agrawal, “Parallel Q𝑄{Q}italic_Q-learning: Scaling off-policy reinforcement learning under massively parallel simulation,” in Proc. Int. Conf. Mach. Learn., vol. 202, Jul. 2023, pp. 19 440–19 459.
  32. K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2980–2988.
  33. F. Caccavale, C. Natale, B. Siciliano, and L. Villani, “Six-DOF impedance control based on angle/axis representations,” IEEE Trans. Robot. Automat., vol. 15, no. 2, pp. 289–300, Apr. 1999.
  34. O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,” IEEE J. Robot. Automat., vol. 3, no. 1, pp. 43–53, Feb. 1987.
  35. J. Parker-Holder, V. Nguyen, and S. J. Roberts, “Provably efficient online hyperparameter optimization with population-based bandits,” in Proc. Adv. Neural Inform. Process. Syst., vol. 33, Dec. 2020, pp. 17 200–17 211.
  36. V. Dalibard and M. Jaderberg, “Faster improvement rate population based training,” arXiv preprint: 2109.13800, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Asad Ali Shahid (3 papers)
  2. Yashraj Narang (24 papers)
  3. Vincenzo Petrone (6 papers)
  4. Enrico Ferrentino (12 papers)
  5. Ankur Handa (39 papers)
  6. Dieter Fox (201 papers)
  7. Marco Pavone (314 papers)
  8. Loris Roveda (10 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.