Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy (2403.01734v1)

Published 4 Mar 2024 in cs.RO, cs.AI, and cs.LG

Abstract: Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these methods encounter limitations when dealing with diverse constraints in complex environments, such as safety constraints. Some of these approaches prioritize goal attainment without considering safety, while others excessively focus on safety at the expense of training efficiency. In this paper, we study the problem of constrained offline GCRL and propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals. To evaluate the method performance, we build a benchmark based on the robot-fetching environment with a randomly positioned obstacle and use expert or random policies to generate an offline dataset. We compare RbSL with three offline GCRL algorithms and one offline safe RL algorithm. As a result, our method outperforms the existing state-of-the-art methods to a large extent. Furthermore, we validate the practicality and effectiveness of RbSL by deploying it on a real Panda manipulator. Code is available at https://github.com/Sunlighted/RbSL.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. L. Smith, J. C. Kew, X. B. Peng, S. Ha, J. Tan, and S. Levine, “Legged robots that keep on learning: Fine-tuning locomotion policies in the real world,” in 2022 International Conference on Robotics and Automation (ICRA), pp. 1593–1599, IEEE, 2022.
  2. M. Wilson and T. Hermans, “Learning to manipulate object collections using grounded state representations,” in Conference on Robot Learning, pp. 490–502, PMLR, 2020.
  3. Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: learning a unified policy for manipulation and locomotion,” in Conference on Robot Learning, pp. 138–149, PMLR, 2023.
  4. S. W. Abeyruwan, L. Graesser, D. B. D’Ambrosio, A. Singh, A. Shankar, A. Bewley, D. Jain, K. M. Choromanski, and P. R. Sanketi, “i-sim2real: Reinforcement learning of robotic policies in tight human-robot interaction loops,” in Conference on Robot Learning, pp. 212–224, PMLR, 2023.
  5. S. Kumar, J. Zamora, N. Hansen, R. Jangir, and X. Wang, “Graph inverse reinforcement learning from diverse videos,” in Conference on Robot Learning, pp. 55–66, PMLR, 2023.
  6. P. Rauber, A. Ummadisingu, F. Mutz, and J. Schmidhuber, “Hindsight policy gradients,” in International Conference on Learning Representations, 2018.
  7. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, and W. Zaremba, “Hindsight experience replay,” Advances in neural information processing systems, vol. 30, 2017.
  8. D. Ghosh, A. Gupta, A. Reddy, J. Fu, C. M. Devin, B. Eysenbach, and S. Levine, “Learning to reach goals via iterated supervised learning,” in International Conference on Learning Representations, 2020.
  9. R. Yang, Y. Lu, W. Li, H. Sun, M. Fang, Y. Du, X. Li, L. Han, and C. Zhang, “Rethinking goal-conditioned supervised learning and its connection to offline rl,” in International Conference on Learning Representations, 2021.
  10. Y. Chebotar, K. Hausman, Y. Lu, T. Xiao, D. Kalashnikov, J. Varley, A. Irpan, B. Eysenbach, R. C. Julian, C. Finn, et al., “Actionable models: Unsupervised offline reinforcement learning of robotic skills,” in International Conference on Machine Learning, pp. 1518–1528, PMLR, 2021.
  11. J. Y. Ma, J. Yan, D. Jayaraman, and O. Bastani, “Offline goal-conditioned reinforcement learning via f𝑓fitalic_f-advantage regression,” Advances in Neural Information Processing Systems, vol. 35, pp. 310–323, 2022.
  12. J. Hejna, J. Gao, and D. Sadigh, “Distance weighted supervised learning for offline interaction data,” arXiv preprint arXiv:2304.13774, 2023.
  13. K. Fang, P. Yin, A. Nair, H. R. Walke, G. Yan, and S. Levine, “Generalization with lossy affordances: Leveraging broad offline data for learning visuomotor tasks,” in Conference on Robot Learning, pp. 106–117, PMLR, 2023.
  14. L. Mezghani, S. Sukhbaatar, P. Bojanowski, A. Lazaric, and K. Alahari, “Learning goal-conditioned policies offline with self-supervised reward shaping,” in Conference on Robot Learning, pp. 1401–1410, PMLR, 2023.
  15. L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
  16. E. Altman, Constrained Markov Decision Processes. PhD thesis, INRIA, 1995.
  17. Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-constrained reinforcement learning with percentile risk criteria,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6070–6120, 2017.
  18. H. Xu, X. Zhan, and X. Zhu, “Constraints penalized q-learning for safe offline reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8753–8760, 2022.
  19. J. Lee, C. Paduraru, D. J. Mankowitz, N. Heess, D. Precup, K.-E. Kim, and A. Guez, “Coptidice: Offline constrained reinforcement learning via stationary distribution correction estimation,” in International Conference on Learning Representations, 2021.
  20. B. Thananjeyan, A. Balakrishna, S. Nair, M. Luo, K. Srinivasan, M. Hwang, J. E. Gonzalez, J. Ibarz, C. Finn, and K. Goldberg, “Recovery rl: Safe reinforcement learning with learned recovery zones,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4915–4922, 2021.
  21. Y. Lin, J. Huang, M. Zimmer, Y. Guan, J. Rojas, and P. Weng, “Invariant transform experience replay: Data augmentation for deep reinforcement learning,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6615–6622, 2020.
  22. N. Dengler, D. Großklaus, and M. Bennewitz, “Learning goal-oriented non-prehensile pushing in cluttered scenes,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1116–1122, IEEE, 2022.
  23. W. Yang, H. Wang, D. Cai, J. Pajarinen, and J.-K. Kämäräinen, “Swapped goal-conditioned offline reinforcement learning,” arXiv preprint arXiv:2302.08865, 2023.
  24. W. Li, X. Wang, B. Jin, and H. Zha, “Hierarchical diffusion for offline decision making,” in International Conference on Machine Learning, pp. 20035–20064, PMLR, 2023.
  25. J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in International conference on machine learning, pp. 22–31, PMLR, 2017.
  26. C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” in International Conference on Learning Representations, 2018.
  27. L. Zhang, Z. Yan, L. Shen, S. Li, X. Wang, and D. Tao, “Safety correction from baseline: Towards the risk-aware policy in robotics via dual-agent reinforcement learning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9027–9033, IEEE, 2022.
  28. Q. Lin, B. Tang, Z. Wu, C. Yu, S. Mao, Q. Xie, X. Wang, and D. Wang, “Safe offline reinforcement learning with real-time budget constraints,” arXiv preprint arXiv:2306.00603, 2023.
  29. H. Le, C. Voloshin, and Y. Yue, “Batch policy learning under constraints,” in International Conference on Machine Learning, pp. 3703–3712, PMLR, 2019.
  30. Z. Liu, Z. Guo, Y. Yao, Z. Cen, W. Yu, T. Zhang, and D. Zhao, “Constrained decision transformer for offline safe reinforcement learning,” arXiv preprint arXiv:2302.07351, 2023.
  31. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
  32. R. F. Prudencio, M. R. Maximo, and E. L. Colombini, “A survey on offline reinforcement learning: Taxonomy, review, and open problems,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  33. M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018.
  34. X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage-weighted regression: Simple and scalable off-policy reinforcement learning,” arXiv preprint arXiv:1910.00177, 2019.
  35. I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” in International Conference on Learning Representations, 2021.
  36. F. Torabi, G. Warnell, and P. Stone, “Behavioral cloning from observation,” arXiv preprint arXiv:1805.01954, 2018.
  37. S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” Advances in neural information processing systems, vol. 34, pp. 20132–20145, 2021.
  38. M. Plappert, M. Andrychowicz, A. Ray, B. McGrew, B. Baker, G. Powell, J. Schneider, J. Tobin, M. Chociej, P. Welinder, et al., “Multi-goal reinforcement learning: Challenging robotics environments and request for research,” arXiv preprint arXiv:1802.09464, 2018.
  39. Q. Gallouédec, N. Cazin, E. Dellandréa, and L. Chen, “panda-gym: Open-source goal-conditioned environments for robotic learning,” arXiv preprint arXiv:2106.13687, 2021.
  40. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033, IEEE, 2012.
  41. S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning, pp. 1587–1596, PMLR, 2018.
  42. A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
  43. A. Stooke, J. Achiam, and P. Abbeel, “Responsive safety in reinforcement learning by pid lagrangian methods,” in International Conference on Machine Learning, pp. 9133–9143, PMLR, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chenyang Cao (5 papers)
  2. Zichen Yan (3 papers)
  3. Renhao Lu (4 papers)
  4. Junbo Tan (10 papers)
  5. Xueqian Wang (99 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.