Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints (2304.08743v2)

Published 18 Apr 2023 in cs.LG and cs.RO

Abstract: This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics control environments, encompassing multiple action constraint types. Our evaluation provides the first in-depth perspective of the field, revealing surprising insights, including the effectiveness of a straightforward baseline approach. The benchmark problems and associated code utilized in our experiments are made available online at github.com/omron-sinicx/action-constrained-RL-benchmark for further research and development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Y. Fujita and S. Maeda, “Clipped Action Policy Gradient,” in International Conference on Machine Learning, 2018, pp. 1597–1606.
  2. T.-H. Pham, G. De Magistris, and R. Tachibana, “OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World,” in 2018 IEEE International Conference on Robotics and Automation.   IEEE Press, 2018, pp. 6236–6243.
  3. S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in 2017 IEEE International Conference on Robotics and Automation.   IEEE Press, 2017, pp. 3389–3396.
  4. R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks,” in Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019, pp. 3387–3395.
  5. B. Chen, P. L. Donti, K. Baker, J. Z. Kolter, and M. Bergés, “Enforcing Policy Feasibility Constraints through Differentiable Projection for Energy Optimization,” in Proceedings of the Twelfth ACM International Conference on Future Energy Systems.   ACM, 2021, pp. 199–210.
  6. G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa, “Safe Exploration in Continuous Action Spaces,” 2018, arXiv:1801.08757.
  7. A. Bhatia, P. Varakantham, and A. Kumar, “Resource Constrained Deep Reinforcement Learning,” in Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, 2019, pp. 610–620.
  8. J.-L. Lin, W.-T. Hung, S. Yang, P.-C. Hsieh, and X. Liu, “Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization,” in Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021.
  9. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proceedings of the Fourth International Conference on Learning Representations 2016, 2016.
  10. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” in Proceedings of the Thirty-Fifth International Conference on Machine Learning.   PMLR, 2018, pp. 1861–1870.
  11. Y. Chow, O. Nachum, A. Faust, E. Duenez-Guzman, and M. Ghavamzadeh, “Lyapunov-based Safe Policy Optimization for Continuous Control,” 2019, arXiv:1901.10031.
  12. S. Sanket, A. Sinha, P. Varakantham, P. Andrew, and M. Tambe, “Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning,” 2020, pp. 2226–2235.
  13. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2012, pp. 5026–5033.
  14. B. Ellenberger, “PyBullet gymperium,” https://github.com/benelot/pybullet-gym, 2018–2019.
  15. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
  16. V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, 1999.
  17. S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” Oct. 2018, arXiv:1802.09477 [cs, stat].
  18. J. Li, D. Fridovich-Keil, S. Sojoudi, and C. J. Tomlin, “Augmented Lagrangian Method for Instantaneously Constrained Reinforcement Learning Problems,” in Proceeding of the Sixtieth IEEE Conference on Decision and Control.   IEEE, 2021, pp. 2982–2989.
  19. A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control Barrier Functions: Theory and Applications,” in Proceeding of the Eighteenth European Control Conference, 2019, pp. 3420–3431.
  20. M. Pereira, Z. Wang, I. Exarchos, and E. Theodorou, “Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward SDEs,” in Proceedings of the 2020 Conference on Robot Learning.   PMLR, 2021, pp. 1783–1801.
  21. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021.
  22. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proceedings of the Thirty-First International Conference on International Conference on Machine Learning.   JMLR, 2014, pp. 387–395.
  23. B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proceedings of the Thirty-Fourth International Conference on Machine Learning.   JMLR, 2017, pp. 136–145.
  24. A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” Advances in neural information processing systems, vol. 32, 2019.
  25. M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Naval Research Logistics Quarterly, vol. 3, no. 1-2, 1956.
  26. M. Brosowsky, F. Keck, O. Dünkel, and M. Zöllner, “Sample-Specific Output Constraints for Neural Networks,” in Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021, pp. 6812–6821.
  27. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  28. A. Raffin, “RL baselines3 zoo,” https://github.com/DLR-RM/rl-baselines3-zoo, 2020.
  29. B. Bixby, “The gurobi optimizer,” Transp. Re-search Part B, vol. 41, no. 2, pp. 159–178, 2007.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com