Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints (2304.08743v2)
Abstract: This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms. In action-constrained RL, each action taken by the learning system must comply with certain constraints. These constraints are crucial for ensuring the feasibility and safety of actions in real-world systems. We evaluate existing algorithms and their novel variants across multiple robotics control environments, encompassing multiple action constraint types. Our evaluation provides the first in-depth perspective of the field, revealing surprising insights, including the effectiveness of a straightforward baseline approach. The benchmark problems and associated code utilized in our experiments are made available online at github.com/omron-sinicx/action-constrained-RL-benchmark for further research and development.
- Y. Fujita and S. Maeda, “Clipped Action Policy Gradient,” in International Conference on Machine Learning, 2018, pp. 1597–1606.
- T.-H. Pham, G. De Magistris, and R. Tachibana, “OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World,” in 2018 IEEE International Conference on Robotics and Automation. IEEE Press, 2018, pp. 6236–6243.
- S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in 2017 IEEE International Conference on Robotics and Automation. IEEE Press, 2017, pp. 3389–3396.
- R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks,” in Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019, pp. 3387–3395.
- B. Chen, P. L. Donti, K. Baker, J. Z. Kolter, and M. Bergés, “Enforcing Policy Feasibility Constraints through Differentiable Projection for Energy Optimization,” in Proceedings of the Twelfth ACM International Conference on Future Energy Systems. ACM, 2021, pp. 199–210.
- G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, and Y. Tassa, “Safe Exploration in Continuous Action Spaces,” 2018, arXiv:1801.08757.
- A. Bhatia, P. Varakantham, and A. Kumar, “Resource Constrained Deep Reinforcement Learning,” in Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, 2019, pp. 610–620.
- J.-L. Lin, W.-T. Hung, S. Yang, P.-C. Hsieh, and X. Liu, “Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization,” in Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proceedings of the Fourth International Conference on Learning Representations 2016, 2016.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” in Proceedings of the Thirty-Fifth International Conference on Machine Learning. PMLR, 2018, pp. 1861–1870.
- Y. Chow, O. Nachum, A. Faust, E. Duenez-Guzman, and M. Ghavamzadeh, “Lyapunov-based Safe Policy Optimization for Continuous Control,” 2019, arXiv:1901.10031.
- S. Sanket, A. Sinha, P. Varakantham, P. Andrew, and M. Tambe, “Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning,” 2020, pp. 2226–2235.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.
- B. Ellenberger, “PyBullet gymperium,” https://github.com/benelot/pybullet-gym, 2018–2019.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” 2016.
- V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, 1999.
- S. Fujimoto, H. van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” Oct. 2018, arXiv:1802.09477 [cs, stat].
- J. Li, D. Fridovich-Keil, S. Sojoudi, and C. J. Tomlin, “Augmented Lagrangian Method for Instantaneously Constrained Reinforcement Learning Problems,” in Proceeding of the Sixtieth IEEE Conference on Decision and Control. IEEE, 2021, pp. 2982–2989.
- A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control Barrier Functions: Theory and Applications,” in Proceeding of the Eighteenth European Control Conference, 2019, pp. 3420–3431.
- M. Pereira, Z. Wang, I. Exarchos, and E. Theodorou, “Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward SDEs,” in Proceedings of the 2020 Conference on Robot Learning. PMLR, 2021, pp. 1783–1801.
- A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021.
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proceedings of the Thirty-First International Conference on International Conference on Machine Learning. JMLR, 2014, pp. 387–395.
- B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proceedings of the Thirty-Fourth International Conference on Machine Learning. JMLR, 2017, pp. 136–145.
- A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” Advances in neural information processing systems, vol. 32, 2019.
- M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Naval Research Logistics Quarterly, vol. 3, no. 1-2, 1956.
- M. Brosowsky, F. Keck, O. Dünkel, and M. Zöllner, “Sample-Specific Output Constraints for Neural Networks,” in Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021, pp. 6812–6821.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- A. Raffin, “RL baselines3 zoo,” https://github.com/DLR-RM/rl-baselines3-zoo, 2020.
- B. Bixby, “The gurobi optimizer,” Transp. Re-search Part B, vol. 41, no. 2, pp. 159–178, 2007.