Data-Driven Merton's Strategies via Policy Randomization (2312.11797v2)
Abstract: We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. The agent under consideration is a price taker who has access only to the stock and factor value processes and the instantaneous volatility. We propose an auxiliary problem in which the agent can invoke policy randomization according to a specific class of Gaussian distributions, and prove that the mean of its optimal Gaussian policy solves the original Merton problem. With randomized policies, we are in the realm of continuous-time reinforcement learning (RL) recently developed in Wang et al. (2020) and Jia and Zhou (2022a, 2022b, 2023), enabling us to solve the auxiliary problem in a data-driven way without having to estimate the model primitives. Specifically, we establish a policy improvement theorem based on which we design both online and offline actor-critic RL algorithms for learning Merton's strategies. A key insight from this study is that RL in general and policy randomization in particular are useful beyond the purpose for exploration -- they can be employed as a technical tool to solve a problem that cannot be otherwise solved by mere deterministic policies. At last, we carry out both simulation and empirical studies in a stochastic volatility environment to demonstrate the decisive outperformance of the devised RL algorithms in comparison to the conventional model-based, plug-in method.
- Bergman YZ (1985) Time preference and capital asset pricing models. Journal of Financial Economics 14(1):145–159.
- Chacko G, Viceira LM (2005) Dynamic consumption and portfolio choice with stochastic volatility in incomplete markets. The Review of Financial Studies 18(4):1369–1402.
- Drimus GG (2012) Options on realized variance by transform methods: A non-affine stochastic volatility model. Quantitative Finance 12(11):1679–1694.
- Duffie D, Epstein LG (1992) Stochastic differential utility. Econometrica 353–394.
- Geman S, Hwang CR (1986) Diffusions for global optimization. SIAM Journal on Control and Optimization 24(5):1031–1043.
- Han J, E W (2016) Deep learning approximation for stochastic control problems. arXiv preprint arXiv:1611.07422 .
- Jia Y, Zhou XY (2022a) Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research 23(154):1–55.
- Jia Y, Zhou XY (2022b) Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. Journal of Machine Learning Research 23(154):1–55.
- Jia Y, Zhou XY (2023) q-Learning in continuous time. Journal of Machine Learning Research 24:1–61.
- Kraft H (2005) Optimal portfolios and Heston’s stochastic volatility model: An explicit solution for power utility. Quantitative Finance 5(3):303–313.
- Kydland FE, Prescott EC (1982) Time to build and aggregate fluctuations. Econometrica 1345–1370.
- Liu J (2007) Portfolio selection in stochastic environments. The Review of Financial Studies 20(1):1–39.
- Luenberger DG (1998) Investment Science (Oxford University Press: New York).
- Markowitz H (1952) Portfolio selection. The Journal of Finance 7(1):77–91.
- Merton RC (1969) Lifetime portfolio selection under uncertainty: The continuous-time case. The Review of Economics and Statistics 247–257.
- Merton RC (1980) On estimating the expected return on the market: An exploratory investigation. Journal of Financial Economics 8(4):323–361.
- Wachter JA (2002) Portfolio and consumption decisions under mean-reverting returns: An exact solution for complete markets. Journal of Financial and Quantitative Analysis 37(1):63–91.
- Wang H, Zhou XY (2020) Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance 30(4):1273–1308.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.