Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training Time Safety Violations (2312.07828v2)

Published 13 Dec 2023 in eess.SY and cs.SY

Abstract: This paper introduces the reinforcement learning backup shield (RLBUS), an algorithm that guarantees safe exploration in reinforcement learning (RL) by incorporating backup control barrier functions (BCBFs). RLBUS constructs an implicit control forward invariant subset of the safe set using multiple backup policies, ensuring safety in the presence of input constraints. While traditional BCBFs often result in conservative control forward-invariant sets due to the design of backup controllers, RLBUS addresses this limitation by leveraging model-free RL to train an additional backup policy, which enlarges the identified control forward invariant subset of the safe set. This approach enables the exploration of larger regions in the state space with zero safety violations during training. The effectiveness of RLBUS is demonstrated on an inverted pendulum example, where the expanded invariant set allows for safe exploration over a broader state space, enhancing performance without compromising safety.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
  2. Safe reinforcement learning via shielding. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  3. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861–3876, 2016.
  4. Safe reinforcement learning via statistical model predictive shielding. In Robotics: Science and Systems, pages 1–13, 2021.
  5. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
  6. Franco Blanchini. Set invariance in control. Automatica, 35(11):1747–1767, 1999.
  7. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  8. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167):1–51, 2018.
  9. A scalable safety critical control framework for nonlinear systems. IEEE Access, 8:187249–187275, 2020.
  10. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  11. Learning-based model predictive control for safe exploration. In 2018 IEEE conference on decision and control (CDC), pages 6059–6066. IEEE, 2018.
  12. Convex computation of the maximum controlled invariant set for polynomial control systems. SIAM Journal on Control and Optimization, 52(5):2944–2969, 2014.
  13. A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games. IEEE Transactions on automatic control, 50(7):947–957, 2005.
  14. Composition of control barrier functions with differing relative degrees for safety under input constraints. arXiv preprint arXiv:2310.00363, 2023a.
  15. Soft-minimum and soft-maximum barrier functions for safety with actuation constraints. arXiv preprint arXiv:2305.10620, 2023b.
  16. Soft-minimum barrier functions for safety-critical control subject to actuation constraints. In 2023 American Control Conference (ACC). IEEE, May 2023c. 10.23919/acc55779.2023.10156245. URL http://dx.doi.org/10.23919/ACC55779.2023.10156245.
  17. Learning control barrier functions from expert demonstrations. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 3717–3724. IEEE, 2020.
  18. Time-varying soft-maximum control barrier functions for safety in an a priori unknown environment. arXiv preprint arXiv:2310.05261, 2023.
  19. Safe exploration for active learning with gaussian processes. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part III 15, pages 133–149. Springer, 2015.
  20. Synthesis of control barrier functions using a supervised machine learning approach. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 7139–7145. IEEE, 2020.
  21. Learning for safety-critical control with control barrier functions. In Learning for Dynamics and Control, pages 708–717. PMLR, 2020.
  22. Linear model predictive safety certification for learning-based control. In 2018 IEEE Conference on Decision and Control (CDC), pages 7130–7135. IEEE, 2018.
  23. Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  24. Safe learning of quadrotor dynamics using barrier certificates. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 2460–2465. IEEE, 2018.
  25. Constructive safety using control barrier functions. IFAC Proceedings Volumes, 40(12):462–467, 2007.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Pedram Rabiee (8 papers)
  2. Amirsaeid Safari (4 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.