A Survey of Constraint Formulations in Safe Reinforcement Learning (2402.02025v2)
Abstract: Safety is critical when applying reinforcement learning (RL) to real-world problems. As a result, safe RL has emerged as a fundamental and powerful paradigm for optimizing an agent's policy while incorporating notions of safety. A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward subject to specific safety constraints. Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult. This challenge stems from the diversity of constraint representations and little exploration of their interrelations. To bridge this knowledge gap, we present a comprehensive review of representative constraint formulations, along with a curated selection of algorithms designed specifically for each formulation. In addition, we elucidate the theoretical underpinnings that reveal the mathematical mutual relations among common problem formulations. We conclude with a discussion of the current state and future directions of safe reinforcement learning research.
- Constrained policy optimization. In ICML, 2017.
- Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
- Safe reinforcement learning via shielding. In AAAI, 2018.
- Eitan Altman. Constrained Markov decision processes. CRC Press, 1999.
- Doubly pessimistic algorithms for strictly safe off-policy optimization. In CISS, 2022.
- Safe reinforcement learning with linear function approximation. In ICML, 2021.
- Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016.
- Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
- Constrained policy optimization via Bayesian world models. In ICLR, 2021.
- Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. In AAAI, 2022.
- Safe model-based reinforcement learning with stability guarantees. In NeurIPS, 2017.
- Conservative safety critics for exploration. In ICLR, 2021.
- Vivek S Borkar. Q-learning for risk-sensitive control. Mathematics of operations research, 27(2):294–311, 2002.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 2021.
- DOPE: Doubly optimistic and pessimistic exploration for safe reinforcement learning. In NeurIPS, 2022.
- End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In AAAI, 2019.
- Safe RLHF: Safe reinforcement learning from human feedback. arXiv preprint arXiv:2310.12773, 2023.
- Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
- Natural policy gradient primal-dual method for constrained Markov decision processes. In NeurIPS, 2020.
- Provably efficient safe exploration via primal-dual policy optimization. In AISTAT, 2021.
- A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control, 64(7):2737–2752, 2018.
- Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In AAAI, 2018.
- A comprehensive survey on safe reinforcement learning. JMLR, 16(1):1437–1480, 2015.
- Peter Geibel. Reinforcement learning for MDPs with constraints. In ECML, 2006.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- Matthias Heger. Consideration of risk in reinforcement learning. In ICML, 1994.
- OmniSafe: An infrastructure for accelerating safe reinforcement learning research. arXiv preprint arXiv:2305.09304, 2023.
- Safe reinforcement learning for sepsis treatment. In IEEE ICHI, 2020.
- Safe reinforcement learning using Wasserstein distributionally robust MPC and chance constraint. IEEE Access, 10:130058–130067, 2022.
- Batch policy learning under constraints. In ICML, 2019.
- COptiDICE: Offline constrained reinforcement learning via stationary distribution correction estimation. In ICLR, 2021.
- End-to-end training of deep visuomotor policies. JMLR, 17(1):1334–1373, 2016.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- IPO: Interior-point policy optimization under constraints. In AAAI, 2020.
- Policy learning with constraints in model-free reinforcement learning: A survey. In IJCAI, 2021.
- Constrained variational policy optimization for safe reinforcement learning. In ICML, 2022.
- Datasets and benchmarks for offline safe reinforcement learning. arXiv preprint arXiv:2306.09303, 2023.
- Constrained decision transformer for offline safe reinforcement learning. In ICML, 2023.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
- Safe exploration in markov decision processes. In ICML, 2012.
- Safe chance constrained reinforcement learning for batch process control. Computers & chemical engineering, 157:107630, 2022.
- Chance-constrained dynamic programming with application to risk-aware robotic space exploration. Autonomous Robots, 39(4):555–571, 2015.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022.
- Safe reinforcement learning with chance-constrained model predictive control. In L4DC, 2022.
- Optlayer-practical constrained optimization for deep reinforcement learning in the real world. In IEEE ICRA, 2018.
- Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 2019.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
- A near-optimal algorithm for safe reinforcement learning under instantaneous hard constraints. In ICML, 2023.
- Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- Enhancing safe exploration using safety state augmentation. In NeurIPS, 2022.
- Sauté RL: Almost surely safe reinforcement learning using state augmentation. In ICML, 2022.
- Responsive safety in reinforcement learning by PID Lagrangian methods. In ICML, 2020.
- Reinforcement learning: An introduction. MIT press, 1998.
- Policy gradients with variance related risk criteria. In ICML, 2012.
- Reward constrained policy optimization. In ICLR, 2019.
- Recovery RL: Safe reinforcement learning with learned recovery zones. IEEE RA-L, 6(3):4915–4922, 2021.
- Safe reinforcement learning by imagining the near future. In NeurIPS, 2021.
- Safe exploration in finite Markov decision processes with Gaussian processes. In NeurIPS, 2016.
- Safe reinforcement learning via curriculum induction. In NeurIPS, 2020.
- Safe reinforcement learning in constrained Markov decision processes. In ICML, 2020.
- Safe exploration and optimization of constrained MDPs using Gaussian processes. In AAAI, 2018.
- Safe exploration in reinforcement learning: A generalized formulation and algorithms. In NeurIPS, 2023.
- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In ICML, 2023.
- Offline constrained multi-objective reinforcement learning via pessimistic dual value iteration. In NeurIPS, 2021.
- CRPO: A new approach for safe reinforcement learning with convergence guarantee. In ICML, 2021.
- Constraints penalized Q-learning for safe offline reinforcement learning. In AAAI, 2022.
- Projection-based constrained policy optimization. In ICLR, 2020.
- Safe reinforcement learning with natural language constraints. In NeurIPS, 2021.
- Towards safe reinforcement learning with a safety editor policy. In NeurIPS, 2022.
- First order constrained optimization in policy space. In NeurIPS, 2020.
- Penalized proximal policy optimization for safe reinforcement learning. In IJCAI, 2022.