Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Maximization of Long-Run Reward CVaR for Markov Decision Processes (2312.01586v1)

Published 4 Dec 2023 in math.OC, cs.SY, and eess.SY

Abstract: This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directions, we prove that the maximum of long-run CVaR of MDPs over the set of history-dependent randomized policies can be found within the class of stationary randomized policies. In contrast to classical MDPs, we find that there may not exist an optimal stationary deterministic policy for maximizing CVaR. Instead, we prove the existence of an optimal stationary randomized policy that requires randomizing over at most two actions. Via a convex optimization representation of CVaR, we convert the long-run CVaR maximization MDP into a minimax problem, where we prove the interchangeability of minimum and maximum and the related existence of saddle point solutions. Furthermore, we propose an algorithm that finds the saddle point solution by solving two linear programs. These results are then extended to objectives that involve maximizing some combination of mean and CVaR of rewards simultaneously. Finally, we conduct numerical experiments to demonstrate the main results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Altman E (1999) Constrained Markov Decision Processes: Stochastic Modeling. Routledge.
  2. Armstrong J, Brigo D (2019) Risk managing tail-risk seekers: VaR and expected shortfall vs S-shaped utility. Journal of Banking & Finance 101:122-135.
  3. Asensio M, Contreras J (2016) Stochastic unit commitment in isolated systems with renewable penetration under CVaR assessment. IEEE Transactions on Smart Grid 7(3):1356-1367.
  4. Barron EN (2013) Game Theory: An Introduction, vol 2. John Wiley & Sons.
  5. Bäuerle N, Ott J (2011) Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research 74(3):361-379.
  6. Chow Y, Ghavamzadeh M (2014) Algorithms for CVaR optimization in MDPs. Advances in Neural Information Processing Systems (NIPS’2014) 27:3509-3517.
  7. Denardo EV (1970) On linear programming in a Markov decision problem. Management Science 16(5):281-288.
  8. Dilokthanakul N, Shanahan M (2018) Deep reinforcement learning with risk-seeking exploration. Proceedings of the International Conference on Simulation of Adaptive Behavior 15:201-211.
  9. García J, Fernández F (1995) A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16:1437-1480.
  10. Guo X, Zhang J (2019) Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces. Discrete Event Dynamic Systems 29(4):445-471.
  11. Haskell WB, Jain R (2015) A convex analytic approach to risk-aware Markov decision processes. SIAM Journal on Control and Optimization 53(3):1569-1598.
  12. Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes. Springer Science & Business Media.
  13. Howard RA, Matheson JE (1972) Risk-sensitive Markov decision processes. Management Science 18(7):356-369.
  14. Huang Y, Guo X (2016) Minimum average value-at-risk for finite horizon semi-Markov decision processes in continuous time. SIAM Journal on Optimization 26(1):1-28.
  15. Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Machine Learning 49:267-290.
  16. Miller CW, Yang I (2017) Optimal control of conditional value-at-risk in continuous time. SIAM Journal on Control and Optimization 55(2):856-884.
  17. Nedić A, Ozdaglar A (2009) Subgradient methods for saddle-point problems. Journal of Optimization Theory and Applications 142(1):205-228.
  18. Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: John Wiley & Sons.
  19. Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. Journal of Risk 2(3):21-42.
  20. Rockafellar RT, Uryasev S (2002) Conditional Value-at-Risk for general loss distributions. Journal of Banking Finance 26:1443-1471.
  21. Sobel MJ (1982) The variance of discounted Markov decision processes. Journal of Applied Probability 19(4):794-802.
  22. Sobel MJ (1994) Mean-variance tradeoffs in an undiscounted MDP. Operations Research 42(1):175-183.
  23. Stanko S, Macek K (2019) Risk-averse distributional reinforcement learning: A CVaR optimization approach. In: Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI’2019):412-423.
  24. Stoer J (1963) Duality in nonlinear programming and the minimax theorem. Numerische Mathematik 5(1):371-379.
  25. Uǧurlu K (2017) Controlled Markov decision processes with AVaR criteria for unbounded costs. Journal of Computational and Applied Mathematics 319:24-37.
  26. White DJ (1993) Minimizing a threshold probability in discounted Markov decision processes. Journal of Mathematical Analysis and Applications 173(2):634-646.
  27. Wu C, Lin Y (1999) Minimizing risk models in Markov decision processes with policies depending on target values. Journal of Mathematical Analysis and Applications 231(1):47-67.
  28. Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269-278.
  29. Xia L (2020) Risk-sensitive Markov decision processes with combined metrics of mean and variance. Production and Operations Management 29(12):2808-2827.
  30. Xia L, Glynn PW (2022) Risk-sensitive Markov decision processes with long-run CVaR criterion. arXiv preprint arXiv:221008740.
  31. Wachi A, Sui Y (2020) Safe reinforcement learning in constrained Markov decision processes. Proceedings of the 37th International Conference on Machine Learning (ICML’2020) 119:9797-9806.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Li Xia (25 papers)
  2. Zhihui Yu (2 papers)
  3. Peter W. Glynn (52 papers)

Summary

We haven't generated a summary for this paper yet.