2000 character limit reached
Risk-sensitive Actor-free Policy via Convex Optimization (2307.00141v1)
Published 30 Jun 2023 in cs.LG
Abstract: Traditional reinforcement learning methods optimize agents without considering safety, potentially resulting in unintended consequences. In this paper, we propose an optimal actor-free policy that optimizes a risk-sensitive criterion based on the conditional value at risk. The risk-sensitive objective function is modeled using an input-convex neural network ensuring convexity with respect to the actions and enabling the identification of globally optimal actions through simple gradient-following methods. Experimental results demonstrate the efficacy of our approach in maintaining effective risk control.
- Input convex neural networks. In International Conference on Machine Learning, pages 146–155. PMLR, 2017.
- Coherent measures of risk. Mathematical finance, 9(3):203–228, 1999.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pages 1587–1596. PMLR, 2018.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- Learning to walk in the real world with minimal human effort. In Conference on Robot Learning, pages 1110–1120. PMLR, 2021.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR, 2015.
- Volodymyr Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Optimizing the CVaR via sampling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, pages 2993–2999, 2015.
- Worst cases policy gradients. In Conference on Robot Learning, pages 1078–1093. PMLR, 2020.
- Numerical optimization. Springer Science, 35(67-68):7, 1999.
- WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning, 2021.