Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
60 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints (2301.10923v2)

Published 26 Jan 2023 in cs.LG and cs.AI

Abstract: In safety-critical robotic tasks, potential failures must be reduced, and multiple constraints must be met, such as avoiding collisions, limiting energy consumption, and maintaining balance. Thus, applying safe reinforcement learning (RL) in such robotic tasks requires to handle multiple constraints and use risk-averse constraints rather than risk-neutral constraints. To this end, we propose a trust region-based safe RL algorithm for multiple constraints called a safe distributional actor-critic (SDAC). Our main contributions are as follows: 1) introducing a gradient integration method to manage infeasibility issues in multi-constrained problems, ensuring theoretical convergence, and 2) developing a TD($\lambda$) target distribution to estimate risk-averse constraints with low biases. We evaluate SDAC through extensive experiments involving multi- and single-constrained robotic tasks. While maintaining high scores, SDAC shows 1.93 times fewer steps to satisfy all constraints in multi-constrained tasks and 1.78 times fewer constraint violations in single-constrained tasks compared to safe RL baselines. Code is available at: https://github.com/rllab-snu/Safe-Distributional-Actor-Critic.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Constrained policy optimization. In Proceedings of International Conference on Machine Learning, pages 22–31, 2017.
  2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
  3. E. Altman. Constrained Markov decision processes, volume 7. CRC Press, 1999.
  4. Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 2022.
  5. A distributional perspective on reinforcement learning. In Proceedings of International Conference on Machine Learning, pages 449–458, 2017.
  6. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org.
  7. Conservative safety critics for exploration. In Proceedings of International Conference on Learning Representations, 2021.
  8. L. Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com.
  9. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(1):6070–6120, 2017.
  10. Implicit quantile networks for distributional reinforcement learning. In Proceedings of International conference on machine learning, pages 1096–1105, 2018a.
  11. Distributional reinforcement learning with quantile regression. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018b.
  12. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Society for Industrial and Applied Mathematics, 1996.
  13. J. Garcıa and F. Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
  14. The reactor: A fast and sample-efficient actor-critic agent for reinforcement learning. In Proceedings of International Conference on Learning Representations, 2018.
  15. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  16. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of International Conference on Machine Learning, pages 1861–1870, 2018.
  17. Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In Proceedings of International Conference on Robotics and Automation, pages 6295–6301, 2019.
  18. D. Kim and S. Oh. Efficient off-policy safe reinforcement learning using trust region conditional value at risk. IEEE Robotics and Automation Letters, 7(3):7644–7651, 2022a.
  19. D. Kim and S. Oh. TRC: Trust region conditional value at risk for safe reinforcement learning. IEEE Robotics and Automation Letters, 7(2):2621–2628, 2022b.
  20. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In Proceedings International Conference on Machine Learning, pages 5556–5566, 2020.
  21. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986, 2020.
  22. COptiDICE: Offline constrained reinforcement learning via stationary distribution correction estimation. In Proceedings of International Conference on Learning Representations, 2022.
  23. Conflict-averse gradient descent for multi-task learning. In Advances in Neural Information Processing Systems, pages 18878–18890, 2021.
  24. IPO: Interior-point policy optimization under constraints. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):4940–4947, 2020.
  25. Constrained variational policy optimization for safe reinforcement learning. In Proceedings of International Conference on Machine Learning, pages 13644–13668, 2022.
  26. An off-policy trust region policy optimization method with monotonic improvement guarantee for deep reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 33(5):2223–2235, 2022.
  27. Catch & carry: Reusable neural controllers for vision-guided whole-body tasks. ACM Transactions on Graphics, 39(4), 2020.
  28. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
  29. GMAC: A distributional perspective on actor-critic framework. In Proceedings of International Conference on Machine Learning, pages 7927–7936, 2021.
  30. Multi-task learning as a bargaining game. arXiv preprint arXiv:2202.01017, 2022.
  31. AMP: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics, 40(4), 2021.
  32. Eligibility traces for off-policy policy evaluation. In Proceedings of International Conference on Machine Learning, pages 759–766, 2000.
  33. Density constrained reinforcement learning. In Proceedings of International Conference on Machine Learning, pages 8682–8692, 2021.
  34. Benchmarking Safe Exploration in Deep Reinforcement Learning. 2019.
  35. An analysis of categorical distributional reinforcement learning. In Proceedings of International Conference on Artificial Intelligence and Statistics, pages 29–37, 2018.
  36. Learning to walk in minutes using massively parallel deep reinforcement learning. In Proceedings of Conference on Robot Learning, pages 91–100, 2022.
  37. Trust region policy optimization. In Proceedings of International Conference on Machine Learning, pages 1889–1897, 2015.
  38. Responsive safety in reinforcement learning by PID lagrangian methods. In Proceedings of International Conference on Machine Learning, pages 9133–9143, 2020.
  39. R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
  40. The nature of temporal difference errors in multi-step distributional reinforcement learning. In Advances in Neural Information Processing Systems, pages 30265–30276, 2022.
  41. Recovery RL: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021.
  42. MuJoCo: A physics engine for model-based control. In Proceedings of International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012.
  43. X. Wang. Unitree-Laikago Pro. http://www.unitree.cc/e/action/ShowInfo.php?classid=6&id=355, 2018.
  44. Feedback control for cassie with deep reinforcement learning. In Proceedings of International Conference on Intelligent Robots and Systems, pages 1241–1246, 2018.
  45. CRPO: A new approach for safe reinforcement learning with convergence guarantee. In Proceedings of International Conference on Machine Learning, pages 11480–11491, 2021.
  46. WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10639–10646, 2021.
  47. Safety-constrained reinforcement learning with a distributional safety critic. Machine Learning, 112:859–887, 2022.
  48. Projection-based constrained policy optimization. In Proceedings of International Conference on Learning Representations, 2020.
  49. Gradient surgery for multi-task learning. In Advances in Neural Information Processing Systems, pages 5824–5836, 2020.
  50. Conservative distributional reinforcement learning with safety constraints. arXiv preprint arXiv:2201.07286, 2022.
  51. J. Zhang and P. Weng. Safe distributional reinforcement learning. In J. Chen, J. Lang, C. Amato, and D. Zhao, editors, Proceedings of International Conference on Distributed Artificial Intelligence, pages 107–128, 2022.
Citations (9)

Summary

We haven't generated a summary for this paper yet.