Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes (2304.12477v4)

Published 24 Apr 2023 in math.OC and cs.AI

Abstract: Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal when the risk level is discretized sufficiently. However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level. In particular, we show that a saddle point property assumed to hold in prior literature may be violated. However, a decomposition does hold for Value-at-Risk and our proof demonstrates how this risk measure differs from CVaR and EVaR. Our findings are significant because risk-averse algorithms are used in high-stake environments, making their correctness much more critical.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Risk-averse control via CVaR barrier functions: Application to bipedal robot locomotion. IEEE Control Systems Letters, 6:878–883, 2021.
  2. A. Ahmadi-Javid. Entropic Value-at-Risk: A new coherent risk measure. Journal of Optimization Theory and Applications, 155(3):1105–1123, 2012.
  3. Coherent measures of risk. Mathematical Finance, 9(3):203–228, 1999.
  4. Basel Committee on Banking Supervision. Minimum capital requirements for market risk. In Basel III: international regulatory framework for banks, 2019.
  5. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361–379, 2011.
  6. Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49(3):409–419, 2004.
  7. A risk-sensitive finite-time reachability approach for safety of stochastic dynamic systems. In American Control Conference (ACC), pages 2958–2963. IEEE, 2019.
  8. Risk-sensitive safety analysis using conditional value-at-risk. IEEE Transactions on Automatic Control, 67(12):6521–6536, 2022.
  9. CVaR-based flight energy risk assessment for multirotor uavs using a deep energy model. In IEEE International Conference on Robotics and Automation (ICRA), pages 262–268, 2021.
  10. Algorithms for CVaR optimization in MDPs. In Neural Information Processing Systems (NIPS), 2014.
  11. Risk-sensitive and robust decision-making: A CVaR optimization approach. In Neural Information Processing Systems (NIPS), 2015.
  12. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18:1–51, 2018.
  13. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006.
  14. “Dice”-sion-making under uncertainty: When can a random decision reduce risk? Management Science, 65(7):3282–3301, 2019.
  15. CVaR optimization for MDPs: Existence and computation of optimal policies. ACM SIGMETRICS Performance Evaluation Review, 50(2):39–41, 2022.
  16. Sequential Optimization of CVaR, February 2023.
  17. Percentile Performance Criteria For Limiting Average Markov Decision Processes. IEEE Transactions on Automatic Control, 40(1):2–10, 1995.
  18. Stochastic Finance: Introduction in Discrete Time. De Gruyter Graduate, fourth edition, 2016.
  19. Wasserstein distributionally robust motion control for collision avoidance using conditional value-at-risk. IEEE Transactions on Robotics, 38(2):939–957, 2021.
  20. Entropic risk optimization in discounted MDPs. In Artificial Intelligence and Statistics (AISTATS), 2023.
  21. Risk-aware motion planning for automated vehicle among human-driven cars. In American Control Conference (ACC), pages 3987–3993, 2019.
  22. Being optimistic to be conservative: Quickly learning a CVaR policy. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4436–4443, 2020.
  23. Ümit Emre Köse. Optimal timing of living-donor liver transplantation under risk-aversion. PhD thesis, Bilkent Universitesi (Turkey), 2016.
  24. Quantile Markov decision processes. Operations Research, 70(3):1428–1447, 2022.
  25. Optimal models with maximizing probability of first achieving target value in the preceding stages. Science in China Series A: Mathematics, 46:396–414, 2003.
  26. Risk-sensitive optimal execution via a conditional value-at-risk objective. arXiv preprint arXiv:2201.11962, 2022.
  27. Risk-sensitive reinforcement learning via Entropic-VaR optimization. In Asilomar Conference on Signals, Systems, and Computers, pages 953–959. IEEE, 2022.
  28. Time-consistent decisions and temporal decomposition of coherent risk functionals. Mathematics of Operations Research, 41(2):682–699, 2016.
  29. Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends in Machine Learning, 15(5):537–693, 2022.
  30. Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.
  31. Risk-averse Bayes-adaptive reinforcement learning. Advances in Neural Information Processing Systems, 34:1142–1154, 2021.
  32. Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2014.
  33. Alexander Shapiro. Interchangeability principle and dynamic equations in risk averse stochastic programming. Operations Research Letters, 45(4):377–381, 2017.
  34. Risk-aware planning and assignment for ground vehicles using uncertain perception from aerial vehicles. In International Conference on Intelligent Robots and Systems (IROS), pages 11763–11769. IEEE, 2020.
  35. Risk-averse distributional reinforcement learning: A CVaR optimization approach. In International Joint Conference on Computational Intelligence, pages 412–423, 2019.
  36. Minimizing risk models in Markov decision processes with policies depending on target values. Journal of mathematical analysis and applications, 231(1):47–67, 1999.
  37. Probabilistic goal Markov decision processes. In International Joint Conference on Artificial Intelligence, 2011.
  38. Huaiyang Zhong. Decision Making for Disease Treatment: Operations Research and Data Analytic Modeling. Stanford University, 2020.
Citations (5)

Summary

We haven't generated a summary for this paper yet.