Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP (2406.07892v1)

Published 12 Jun 2024 in cs.LG and cs.AI

Abstract: Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature, though only asymptotic guarantees are available for this algorithm. We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed with/without regularization. Our bounds exhibit exponential decay for the initial error, while the overall bound is $O(1/t)$, where $t$ is the number of update iterations of the TD algorithm. Further, the bound for the regularized TD variant is for a universal step size. Our bounds open avenues for analysis of actor-critic algorithms for mean-variance optimization in a discounted MDP.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tejaram Sangadi (1 paper)
  2. L. A. Prashanth (12 papers)
  3. Krishna Jagannathan (42 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets