Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces (2403.05811v4)

Published 9 Mar 2024 in stat.ML and cs.LG

Abstract: Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution $\eta\pi$ for a given policy $\pi$. Distributional temporal difference learning has been accordingly proposed, which extends the classic temporal difference learning (TD) in RL. In this paper, we focus on the non-asymptotic statistical rates of distributional TD. To facilitate theoretical analysis, we propose non-parametric distributional TD (NTD). For a $\gamma$-discounted infinite-horizon tabular Markov decision process, we show that for NTD with a generative model, we need $\tilde{O}(\varepsilon{-2}\mu_{\min}{-1}(1-\gamma){-3})$ interactions with the environment to achieve an $\varepsilon$-optimal estimator with high probability, when the estimation error is measured by the $1$-Wasserstein. This sample complexity bound is minimax optimal up to logarithmic factors. In addition, we revisit categorical distributional TD (CTD), showing that the same non-asymptotic convergence bounds hold for CTD in the case of the $1$-Wasserstein distance. We also extend our analysis to the more general setting where the data generating process is Markovian. In the Markovian setting, we propose variance-reduced variants of NTD and CTD, and show that both can achieve a $\tilde{O}(\varepsilon{-2} \mu_{\pi,\min}{-1}(1-\gamma){-3}+t_{mix}\mu_{\pi,\min}{-1}(1-\gamma){-1})$ sample complexity bounds in the case of the $1$-Wasserstein distance, which matches the state-of-the-art statistical results for classic policy evaluation. To achieve the sharp statistical rates, we establish a novel Freedman's inequality in Hilbert spaces. This new Freedman's inequality would be of independent interest for statistical analysis of various infinite-dimensional online learning problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. A distributional perspective on reinforcement learning. In International conference on machine learning, pages 449–458. PMLR, 2017.
  2. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org.
  3. M. Böck and C. Heitzinger. Speedy categorical distributional reinforcement learning and complexity analysis. SIAM Journal on Mathematics of Data Science, 4(2):675–693, 2022. doi: 10.1137/20M1364436. URL https://doi.org/10.1137/20M1364436.
  4. Superhuman performance on sepsis mimic-iii data by distributional reinforcement learning. PLoS One, 17(11):e0275358, 2022.
  5. V. I. Bogachev. Measure theory, volume 1. Springer, 2007.
  6. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  7. Speedy q-learning. Advances in neural information processing systems, 24, 2011.
  8. Minimax pac bounds on the sample complexity of reinforcement learning with a generative model. Machine learning, 91:325–349, 2013.
  9. There is a risk-return trade-off after all. Journal of financial economics, 76(3):509–548, 2005.
  10. S. M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
  11. A sparse sampling algorithm for near-optimal planning in large markov decision processes. Machine learning, 49:193–208, 2002.
  12. P. W. Lavori and R. Dawson. Dynamic treatment regimes: practical design considerations. Clinical trials, 1(1):9–20, 2004.
  13. Is q-learning minimax optimal? a tight sample complexity analysis. Operations Research, 72(1):222–236, 2024.
  14. S. Luo. On azuma-type inequalities for banach space-valued martingales. Journal of Theoretical Probability, 35(2):772–800, Jun 2022. ISSN 1572-9230. doi: 10.1007/s10959-021-01086-5. URL https://doi.org/10.1007/s10959-021-01086-5.
  15. Nonparametric return distribution approximation for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 799–806, 2010.
  16. G. Pisier. Martingales in Banach Spaces. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2016. doi: 10.1017/CBO9781316480588.
  17. H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
  18. N. Ross. Fundamentals of stein’s method. Probability Surveys, 8:210–293, 2011.
  19. An analysis of categorical distributional reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pages 29–37. PMLR, 2018.
  20. An analysis of quantile temporal-difference learning. arXiv preprint arXiv:2301.04462, 2023.
  21. Near-minimax-optimal distributional reinforcement learning with a generative model. arXiv preprint arXiv:2402.07598, 2024.
  22. R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3:9–44, 1988.
  23. R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. doi: 10.1017/9781108231596.
  24. Distributional offline policy evaluation with predictive error guarantees. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 37685–37712. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/wu23s.html.
  25. Estimation and inference in distributional reinforcement learning. arXiv preprint arXiv:2309.17262, 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 tweets and received 10 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube