Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality (2312.03618v2)

Published 6 Dec 2023 in math.OC and cs.GT

Abstract: Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to 1). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, that history-dependent (Markovian) policies strictly outperform stationary policies for average optimality in s-rectangular RMDPs. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that {\em approximate} Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. The operator approach to entropy games. Theory of Computing Systems, 63(5):1089–1130, 2019.
  2. The complexity of solving stochastic games on graphs. In International Symposium on Algorithms and Computation, pages 112–121. Springer, 2009.
  3. Value iteration for long-run average reward in Markov decision processes. In Computer Aided Verification: 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, pages 201–221. Springer, 2017.
  4. Markov decision processes with applications to finance. Springer Science & Business Media, 2011.
  5. Infinite-horizon policy-gradient estimation. journal of artificial intelligence research, 15:319–350, 2001.
  6. Fast algorithms for l∞subscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT constrained s-rectangular robust MDPs. Advances in Neural Information Processing Systems, 34:25982–25992, 2021.
  7. Artificial intelligence framework for simulating clinical decision-making: A Markov decision process approach. Artificial intelligence in medicine, 57(1):9–19, 2013.
  8. K-J Bierth. An expected average reward criterion. Stochastic processes and their applications, 26:123–140, 1987.
  9. The big match. The Annals of Mathematical Statistics, 39(1):159–163, 1968.
  10. Definable zero-sum stochastic games. Mathematics of Operations Research, 40(1):171–191, 2015.
  11. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  12. Robust imitation learning against variations in environment dynamics. In International Conference on Machine Learning, pages 2828–2852. PMLR, 2022.
  13. Alla Dita Raza Choudary and Constantin P Niculescu. Real analysis on intervals. Springer, 2014.
  14. Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992.
  15. Markov Decision Process (MDP) toolbox for python. https://github.com/sawcordwell/pymdptoolbox, 2015.
  16. Michel Coste. An introduction to o-minimal geometry. Istituti editoriali e poligrafici internazionali Pisa, 2000.
  17. Percentile optimization for Markov decision processes with parameter uncertainty. Operations research, 58(1):203–213, 2010.
  18. Deep direct reinforcement learning for financial signal representation and trading. IEEE transactions on neural networks and learning systems, 28(3):653–664, 2016.
  19. Average-reward model-free reinforcement learning: a systematic review and literature mapping. arXiv preprint arXiv:2010.08920, 2020.
  20. Isometric logratio transformations for compositional data analysis. Mathematical geology, 35(3):279–300, 2003.
  21. Handbook of Markov decision processes: methods and applications, volume 40. Springer Science & Business Media, 2012.
  22. Competitive Markov decision processes. Springer Science & Business Media, 2012.
  23. Dean Gillette. Stochastic games with zero stop probabilities. Contributions to the Theory of Games, 3:179–187, 1957.
  24. Submixing and shift-invariant stochastic games. International Journal of Game Theory, pages 1–36, 2023.
  25. Bounded parameter Markov decision processes. In European Conference on Planning, pages 234–246. Springer, 1997.
  26. Data uncertainty in Markov chains: Application to cost-effectiveness analyses of medical innovations. Operations Research, 66(3):697–715, 2018.
  27. Robust Markov decision processes: Beyond rectangularity. Mathematics of Operations Research, 2022.
  28. Conic blackwell algorithm: Parameter-free convex-concave saddle-point solving. Advances in Neural Information Processing Systems, 34:9587–9599, 2021.
  29. First-order methods for wasserstein distributionally robust MDP. In International Conference on Machine Learning, pages 2010–2019. PMLR, 2021.
  30. Solving optimization problems with blackwell approachability. Mathematics of Operations Research, 2023.
  31. On the convex formulations of robust Markov decision processes. arXiv preprint arXiv:2209.10187, 2022.
  32. Reducing blackwell and average optimality to discounted MDPs via the blackwell discount factor. arXiv preprint arXiv:2302.00036, 2023.
  33. Robustness of proactive intensive care unit transfer policies. Operations Research, 71(5):1653–1688, 2023.
  34. Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. Journal of the ACM (JACM), 60(1):1–16, 2013.
  35. Partial policy iteration for l1-robust Markov decision processes. The Journal of Machine Learning Research, 22(1):12612–12657, 2021.
  36. Robust phi-divergence MDPs. arXiv preprint arXiv:2205.14202, 2022.
  37. G. Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.
  38. Policy gradient for s-rectangular robust Markov decision processes. arXiv preprint arXiv:2301.13589, 2023.
  39. Advances in zero-sum dynamic games. In Handbook of game theory with economic applications, volume 4, pages 27–93. Elsevier, 2015.
  40. Yann Le Tallec. Robust, risk-sensitive, and data-driven control of Markov decision processes. PhD thesis, Massachusetts Institute of Technology, 2007.
  41. Arie Leizarowitz. An algorithm to identify and compute average optimal policies in multichain markov decision processes. Mathematics of Operations Research, 28(3):553–586, 2003.
  42. Policy gradient algorithms for robust MDPs with non-rectangular uncertainty sets. arXiv preprint arXiv:2305.19004, 2023.
  43. First-order policy optimization for robust Markov decision process. arXiv preprint arXiv:2209.10579, 2022.
  44. Robust MDPs with k-rectangular uncertainty. Mathematics of Operations Research, 41(4):1484–1509, 2016.
  45. Repeated games, volume 55. Cambridge University Press, 2015.
  46. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  47. Stochastic games and applications, volume 570. Springer Science & Business Media, 2003.
  48. A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition probabilities. Operations Research, 53(5):780–798, 2005.
  49. Sample complexity of robust reinforcement learning with a generative model. In International Conference on Artificial Intelligence and Statistics, pages 9582–9602. PMLR, 2022.
  50. Hugh Possingham and G Tuck. Application of stochastic dynamic programming to optimal fire management of a spatially structured threatened species. In Proceedings International Congress on Modelling and Simulation, MODSIM, pages 813–817, 1997.
  51. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  52. Jérôme Renault. A tutorial on zero-sum stochastic games. arXiv preprint arXiv:1905.06577, 2019.
  53. Markov decision processes with uncertain transition probabilities. Operations Research, 21(3):728–740, 1973.
  54. Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  55. Sylvain Sorin. A first course on zero-sum repeated games, volume 37. Springer Science & Business Media, 2002.
  56. Markov decision processes for screening and treatment of chronic diseases. Markov Decision Processes in Practice, pages 189–222, 2017.
  57. Reinforcement learning: An introduction. MIT press, 2018.
  58. Bounded parameter Markov decision processes with average reward criterion. In International Conference on Computational Learning Theory, pages 263–277. Springer, 2007.
  59. Lou Van Den Dries. O-minimal structures and real analytic geometry. Current developments in mathematics, 1998(1):105–152, 1998.
  60. Lou Van den Dries and Chris Miller. Geometric categories and o-minimal structures. 1996.
  61. Robust inverse reinforcement learning under transition dynamics mismatch. Advances in Neural Information Processing Systems, 34:25917–25931, 2021.
  62. On the convergence of policy gradient in robust MDPs. arXiv preprint arXiv:2212.10439, 2022.
  63. Robust average-reward Markov decision processes. arXiv preprint arXiv:2301.00858, 2023.
  64. Robust Markov decision processes. Mathematics of Operations Research, 38(1):153–183, 2013.
  65. Distributionally robust Markov decision processes. Advances in Neural Information Processing Systems, 23, 2010.
  66. Robust Markov decision processes for medical treatment decisions. Optimization online, 2017.
  67. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158(1-2):343–359, 1996.
Citations (4)

Summary

We haven't generated a summary for this paper yet.