Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards (2310.03379v2)

Published 5 Oct 2023 in cs.RO

Abstract: Ensuring safety in Reinforcement Learning (RL), typically framed as a Constrained Markov Decision Process (CMDP), is crucial for real-world exploration applications. Current approaches in handling CMDP struggle to balance optimality and feasibility, as direct optimization methods cannot ensure state-wise in-training safety, and projection-based methods correct actions inefficiently through lengthy iterations. To address these challenges, we propose Adaptive Chance-constrained Safeguards (ACS), an adaptive, model-free safe RL algorithm using the safety recovery rate as a surrogate chance constraint to iteratively ensure safety during exploration and after achieving convergence. Theoretical analysis indicates that the relaxed probabilistic constraint sufficiently guarantees forward invariance to the safe set. And extensive experiments conducted on both simulated and real-world safety-critical tasks demonstrate its effectiveness in enforcing safety (nearly zero-violation) while preserving optimality (+23.8%), robustness, and fast response in stochastic real-world settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
  2. Safe reinforcement learning via shielding. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  3. Eitan Altman. Constrained Markov decision processes. Routledge, 2021.
  4. A theory of learning with competing objectives and user feedback. In Progress and Challenges in Building Trustworthy Embodied AI, 2022.
  5. Dimitri Bertsekas. Reinforcement learning and optimal control. Athena Scientific, 2019.
  6. Conservative safety critics for exploration. arXiv preprint arXiv:2010.14497, 2020.
  7. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623, 2019.
  8. A primal-dual approach to constrained markov decision processes. arXiv preprint arXiv:2101.10895, 2021.
  9. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3387–3395, 2019.
  10. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018.
  11. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
  12. Dc3: A learning method for optimization with hard constraints. arXiv preprint arXiv:2104.12225, 2021.
  13. Markov processes: characterization and convergence. John Wiley & Sons, 2009.
  14. Leave no trace: Learning to reset for safe and autonomous reinforcement learning. arXiv preprint arXiv:1711.06782, 2017.
  15. Shieldnn: A provably safe nn filter for unsafe nn controllers. arXiv preprint arXiv:2006.09564, 2020.
  16. Iterative reachability estimation for safe reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  17. Adaptive safe control for driving in uncertain environments. In 2022 IEEE Intelligent Vehicles Symposium (IV), pages 1662–1668. IEEE, 2022.
  18. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
  19. Autocost: Evolving intrinsic cost for zero-violation reinforcement learning. arXiv preprint arXiv:2301.10339, 2023.
  20. Probabilistic safety certificate for multi-agent systems. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 5343–5350. IEEE, 2022.
  21. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029. IEEE, 2019.
  22. Learning-based model predictive control for safe exploration. In 2018 IEEE conference on decision and control (CDC), pages 6059–6066. IEEE, 2018.
  23. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  24. Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv preprint arXiv:1802.06480, 2018.
  25. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503–528, 1989.
  26. Policy learning with constraints in model-free reinforcement learning: A survey. In The 30th International Joint Conference on Artificial Intelligence (IJCAI), 2021.
  27. Learn zero-constraint-violation policy in model-free constrained reinforcement learning. arXiv preprint arXiv:2111.12953, 2021.
  28. Constrained reinforcement learning for dynamic optimization under uncertainty. IFAC-PapersOnLine, 53(2):11264–11270, 2020.
  29. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
  30. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7(1):2, 2019.
  31. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  32. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  33. Safe reinforcement learning using black-box reachability analysis. IEEE Robotics and Automation Letters, 7(4):10665–10672, 2022.
  34. Learning to be safe: Deep rl with a safety critic. arXiv preprint arXiv:2010.14603, 2020.
  35. Reinforcement learning: An introduction. MIT press, 2018.
  36. Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
  37. Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021.
  38. Safety augmented value estimation from demonstrations (saved): Safe deep model-based rl for sparse cost robotic tasks. IEEE Robotics and Automation Letters, 5(2):3612–3619, 2020.
  39. Safe reinforcement learning using advantage-based intervention. In International Conference on Machine Learning, pages 10630–10640. PMLR, 2021.
  40. Myopically verifiable probabilistic certificate for long-term safety. In 2022 American Control Conference (ACC), pages 4894–4900. IEEE, 2022.
  41. Safe control algorithms using energy functions: A uni ed framework, benchmark, and new directions. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 238–243. IEEE, 2019.
  42. Constrained cross-entropy method for safe reinforcement learning. Advances in Neural Information Processing Systems, 31, 2018.
  43. Evaluating model-free reinforcement learning toward safety-critical tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15313–15321, 2023.
  44. First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.
  45. A bionic arm mechanism design and kinematic analysis of the humanoid traffic police. In 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pages 1606–1611. IEEE, 2019.
  46. State-wise safe reinforcement learning: A survey. arXiv preprint arXiv:2302.03122, 2023.
  47. Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
  48. Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models. In Learning for Dynamics and Control Conference, pages 783–796. PMLR, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.