Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey of Constraint Formulations in Safe Reinforcement Learning (2402.02025v2)

Published 3 Feb 2024 in cs.LG and cs.AI

Abstract: Safety is critical when applying reinforcement learning (RL) to real-world problems. As a result, safe RL has emerged as a fundamental and powerful paradigm for optimizing an agent's policy while incorporating notions of safety. A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward subject to specific safety constraints. Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult. This challenge stems from the diversity of constraint representations and little exploration of their interrelations. To bridge this knowledge gap, we present a comprehensive review of representative constraint formulations, along with a curated selection of algorithms designed specifically for each formulation. In addition, we elucidate the theoretical underpinnings that reveal the mathematical mutual relations among common problem formulations. We conclude with a discussion of the current state and future directions of safe reinforcement learning research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Constrained policy optimization. In ICML, 2017.
  2. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
  3. Safe reinforcement learning via shielding. In AAAI, 2018.
  4. Eitan Altman. Constrained Markov decision processes. CRC Press, 1999.
  5. Doubly pessimistic algorithms for strictly safe off-policy optimization. In CISS, 2022.
  6. Safe reinforcement learning with linear function approximation. In ICML, 2021.
  7. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016.
  8. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
  9. Constrained policy optimization via Bayesian world models. In ICLR, 2021.
  10. Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. In AAAI, 2022.
  11. Safe model-based reinforcement learning with stability guarantees. In NeurIPS, 2017.
  12. Conservative safety critics for exploration. In ICLR, 2021.
  13. Vivek S Borkar. Q-learning for risk-sensitive control. Mathematics of operations research, 27(2):294–311, 2002.
  14. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 2021.
  15. DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning. In NeurIPS, 2022.
  16. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In AAAI, 2019.
  17. Safe RLHF: Safe reinforcement learning from human feedback. arXiv preprint arXiv:2310.12773, 2023.
  18. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
  19. Natural policy gradient primal-dual method for constrained Markov decision processes. In NeurIPS, 2020.
  20. Provably efficient safe exploration via primal-dual policy optimization. In AISTAT, 2021.
  21. A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752, 2018.
  22. Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In AAAI, 2018.
  23. A comprehensive survey on safe reinforcement learning. JMLR, 16(1):1437–1480, 2015.
  24. Peter Geibel. Reinforcement learning for MDPs with constraints. In ECML, 2006.
  25. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  26. Matthias Heger. Consideration of risk in reinforcement learning. In ICML, 1994.
  27. OmniSafe: An infrastructure for accelerating safe reinforcement learning research. arXiv preprint arXiv:2305.09304, 2023.
  28. Safe reinforcement learning for sepsis treatment. In IEEE ICHI, 2020.
  29. Safe reinforcement learning using Wasserstein distributionally robust MPC and chance constraint. IEEE Access, 10:130058–130067, 2022.
  30. Batch policy learning under constraints. In ICML, 2019.
  31. COptiDICE: Offline constrained reinforcement learning via stationary distribution correction estimation. In ICLR, 2021.
  32. End-to-end training of deep visuomotor policies. JMLR, 17(1):1334–1373, 2016.
  33. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  34. IPO: Interior-point policy optimization under constraints. In AAAI, 2020.
  35. Policy learning with constraints in model-free reinforcement learning: A survey. In IJCAI, 2021.
  36. Constrained variational policy optimization for safe reinforcement learning. In ICML, 2022.
  37. Datasets and benchmarks for offline safe reinforcement learning. arXiv preprint arXiv:2306.09303, 2023.
  38. Constrained decision transformer for offline safe reinforcement learning. In ICML, 2023.
  39. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
  40. Safe exploration in markov decision processes. In ICML, 2012.
  41. Safe chance constrained reinforcement learning for batch process control. Computers & chemical engineering, 157:107630, 2022.
  42. Chance-constrained dynamic programming with application to risk-aware robotic space exploration. Autonomous Robots, 39(4):555–571, 2015.
  43. Training language models to follow instructions with human feedback. In NeurIPS, 2022.
  44. Safe reinforcement learning with chance-constrained model predictive control. In L4DC, 2022.
  45. Optlayer-practical constrained optimization for deep reinforcement learning in the real world. In IEEE ICRA, 2018.
  46. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 2019.
  47. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  48. A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints. In ICML, 2023.
  49. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
  50. Enhancing safe exploration using safety state augmentation. In NeurIPS, 2022.
  51. Sauté RL: Almost surely safe reinforcement learning using state augmentation. In ICML, 2022.
  52. Responsive safety in reinforcement learning by PID Lagrangian methods. In ICML, 2020.
  53. Reinforcement learning: An introduction. MIT press, 1998.
  54. Policy gradients with variance related risk criteria. In ICML, 2012.
  55. Reward constrained policy optimization. In ICLR, 2019.
  56. Recovery RL: Safe reinforcement learning with learned recovery zones. IEEE RA-L, 6(3):4915–4922, 2021.
  57. Safe reinforcement learning by imagining the near future. In NeurIPS, 2021.
  58. Safe exploration in finite Markov decision processes with Gaussian processes. In NeurIPS, 2016.
  59. Safe reinforcement learning via curriculum induction. In NeurIPS, 2020.
  60. Safe reinforcement learning in constrained Markov decision processes. In ICML, 2020.
  61. Safe exploration and optimization of constrained MDPs using Gaussian processes. In AAAI, 2018.
  62. Safe exploration in reinforcement learning: A generalized formulation and algorithms. In NeurIPS, 2023.
  63. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In ICML, 2023.
  64. Offline constrained multi-objective reinforcement learning via pessimistic dual value iteration. In NeurIPS, 2021.
  65. CRPO: A new approach for safe reinforcement learning with convergence guarantee. In ICML, 2021.
  66. Constraints penalized Q-learning for safe offline reinforcement learning. In AAAI, 2022.
  67. Projection-based constrained policy optimization. In ICLR, 2020.
  68. Safe reinforcement learning with natural language constraints. In NeurIPS, 2021.
  69. Towards safe reinforcement learning with a safety editor policy. In NeurIPS, 2022.
  70. First order constrained optimization in policy space. In NeurIPS, 2020.
  71. Penalized proximal policy optimization for safe reinforcement learning. In IJCAI, 2022.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com