Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty (2404.12598v1)

Published 19 Apr 2024 in cs.LG, cs.SY, eess.SY, q-fin.CP, and q-fin.PM

Abstract: This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation; however, q-learning offers a solution and extends to infinite horizon settings. Finally, I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure. I also conduct simulation experiments to demonstrate how risk-sensitive RL improves the finite-sample performance in the linear-quadratic control problem.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Model-based reinforcement learning in continuous environments using real-time constrained optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
  2. Andradóttir, S. (1995). A stochastic approximation algorithm with varying bounds. Operations Research, 43(6):1037–1048.
  3. Baird, L. C. (1994). Reinforcement learning in continuous time: Advantage updating. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), volume 4, pages 2448–2453. IEEE.
  4. Risk-sensitive dynamic asset management. Applied Mathematics and Optimization, 39:337–360.
  5. Distributionally robust mean-variance portfolio selection with Wasserstein distances. Management Science, 68(9):6382–6410.
  6. Borkar, V. S. (2002). Q-learning for risk-sensitive control. Mathematics of Operations Research, 27(2):294–311.
  7. General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Operations Research, 59(5):1211–1224.
  8. Risk-sensitive and robust decision-making: A CVaR optimization approach. Advances in Neural Information Processing Systems, 28.
  9. Learning equilibrium mean-variance strategy. Mathematical Finance, 33(4):1166–1212.
  10. Learning Merton’s strategies in an incomplete market: Recursive entropy regularization and biased Gaussian exploration. arXiv preprint arXiv:2312.11797.
  11. A dynamic mean-variance analysis for log returns. Management Science, 67(2):1093–1108.
  12. Risk-sensitive Investment Management, volume 19. World Scientific, Singapore.
  13. Asymptotic evaluation of certain Markov process expectations for large time. IV. Communications on Pure and Applied Mathematics, 36(2):183–212.
  14. Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219–245.
  15. Stochastic differential utility. Econometrica, pages 353–394.
  16. Robust properties of risk-sensitive control. Mathematics of Control, Signals and Systems, 13:318–332.
  17. Risk-sensitive soft actor-critic for robust deep reinforcement learning under distribution shifts. arXiv preprint arXiv:2402.09992.
  18. Substitution, risk aversion, and the temporal behavior of consumption. Econometrica, 57(4):937–969.
  19. Exponential Bellman equation and improved regret bounds for risk-sensitive reinforcement learning. Advances in Neural Information Processing Systems, 34:20436–20446.
  20. Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret. Advances in Neural Information Processing Systems, 33:22384–22395.
  21. Risk-sensitive control on an infinite time horizon. SIAM Journal on Control and Optimization, 33(6):1881–1915.
  22. On stochastic relaxed control for partially observed diffusions. Nagoya Mathematical Journal, 93:71–108.
  23. Risk-sensitive control and an optimal investment model II. The Annals of Applied Probability, 12(2):730–767.
  24. Actor-critic learning for mean-field control in continuous time. arXiv preprint arXiv:2303.06993.
  25. Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18(2):141–153.
  26. Robust portfolio control with stochastic factor dynamics. Operations Research, 61(4):874–893.
  27. Entropy regularization for mean field games with learning. Mathematics of Operations research, 47(4):3239–3260.
  28. Robust control and model uncertainty. American Economic Review, 91(2):60–66.
  29. Robustness and ambiguity in continuous time. Journal of Economic Theory, 146(3):1195–1223.
  30. Jacobson, D. (1973). Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Transactions on Automatic control, 18(2):124–131.
  31. Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research, 23(1):6918–6972.
  32. Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. Journal of Machine Learning Research, 23(1):12603–12652.
  33. q-Learning in continuous time. Journal of Machine Learning Research, 24(161):1–61.
  34. The reinforcement learning Kelly strategy. Quantitative Finance, 22(8):1445–1464.
  35. Is Q-learning provably efficient? Advances in Neural Information Processing Systems, 31.
  36. Hamilton-Jacobi deep Q-learning for deterministic continuous-time systems with Lipschitz continuous controls. Journal of Machine Learning Research, 22(206):1–34.
  37. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274.
  38. Stochastic Approximation and Recursive Algorithms, volume 35. Springer-Verlag, New York, 2 edition.
  39. Lai, T. L. (2003). Stochastic approximation. The Annals of Statistics, 31(2):391–406.
  40. Policy iterations for reinforcement learning problems in continuous time and space—Fundamental theory and methods. Automatica, 126:109421.
  41. Knight on risk and uncertainty. Journal of Political Economy, 95(2):394–406.
  42. Maenhout, P. J. (2004). Robust portfolio rules and asset pricing. Review of Financial Studies, 17(4):951–983.
  43. Remarks on risk-sensitive control problems. Applied Mathematics and Optimization, 52:297–310.
  44. Merton, R. C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case. The Review of Economics and Statistics, pages 247–257.
  45. Nagai, H. (1996). Bellman equations of risk-sensitive control. SIAM Journal on Control and Optimization, 34(1):74–101.
  46. Continuous Martingales and Brownian Motion, volume 293. Springer Science & Business Media, Berlin.
  47. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407.
  48. A convergence theorem for non negative almost supermartingales and some applications. Optimizing Methods in Statistics, pages 233–257.
  49. Equivalence between policy gradients and soft Q-learning. arXiv preprint arXiv:1704.06440.
  50. Skiadas, C. (2003). Robust control and recursive utility. Finance and Stochastics, 7:475–489.
  51. Sun, Y. (2006). The exact law of large numbers via Fubini extension and characterization of insurable risks. Journal of Economic Theory, 126(1):31–69.
  52. Optimal scheduling of entropy regulariser for continuous-time linear-quadratic reinforcement learning. SIAM Journal on Control and Optimization, 62(1):135–166.
  53. Making deep Q-learning methods robust to time discretization. In International Conference on Machine Learning, pages 6096–6104. PMLR.
  54. Exploratory HJB equations and their convergence. SIAM Journal on Control and Optimization, 60(6):3191–3216.
  55. Reinforcement learning for continuous-time optimal execution: actor-critic algorithm and error analysis. Available at SSRN 4378950.
  56. Reinforcement learning in continuous time and space: A stochastic control approach. Journal of Machine Learning Research, 21(198):1–34.
  57. Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4):1273–1308.
  58. A finite sample complexity bound for distributionally robust Q-learning. In International Conference on Artificial Intelligence and Statistics, pages 3370–3398. PMLR.
  59. Continuous-time q-learning for McKean-Vlasov control problems. arXiv preprint arXiv:2306.16208.
  60. Risk-sensitive Markov decision process and learning under general utility functions. arXiv preprint arXiv:2311.13589.
  61. Regret bounds for Markov decision processes with recursive optimized certainty equivalents. In International Conference on Machine Learning, pages 38400–38427. PMLR.
  62. Stochastic Controls: Hamiltonian Systems and HJB Equations. New York, NY: Spinger.
  63. Zhou, X. Y. (1992). On the existence of optimal relaxed controls of stochastic partial differential equations. SIAM Journal on Control and Optimization, 30(2):247–261.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube