Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 173 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 177 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression (2305.16877v4)

Published 26 May 2023 in cs.LG and cs.AI

Abstract: Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts rich feedback from environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional BeLLMan operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 34, 2021.
  2. Atari-5: Distilling the arcade learning environment down to five games. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  421–438. PMLR, 23–29 Jul 2023.
  3. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013.
  4. A distributional perspective on reinforcement learning. In ICML, pp.  449–458. PMLR, 2017.
  5. Distributional Reinforcement Learning. MIT Press, 2023.
  6. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  7. Implicit quantile networks for distributional reinforcement learning. In ICML, pp.  1096–1105. PMLR, 2018a.
  8. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018b.
  9. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018.
  10. Expectile asymptotics. Electronic Journal of Statistics, 10(2):2355 – 2371, 2016.
  11. CleanRL: High-quality single-file implementations of deep reinforcement learning algorithms. JMLR, 23(274):1–18, 2022.
  12. A simulation environment and reinforcement learning method for waste reduction. TMLR, 2023.
  13. Offline reinforcement learning with implicit q-learning. In ICLR, 2022.
  14. Nikolai Luzin. The Integral and Trigonometric Series (Russian). PhD thesis, Moscow State University, 1915.
  15. Revisiting the Arcade Learning Environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523–562, 2018.
  16. Stochastically dominant distributional reinforcement learning. In ICML, volume 119, pp.  6745–6754. PMLR, 13–18 Jul 2020.
  17. Distributional reinforcement learning for efficient exploration. In ICML, pp.  4424–4434. PMLR, 2019.
  18. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  19. Asymmetric least squares estimation and testing. Econometrica, 55(4):819–847, 1987.
  20. Colin Philipps. When is an expectile the best linear unbiased estimator? SSRN, 2021a.
  21. Collin Philipps. Interpreting expectiles. SSRN, 2021b.
  22. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.
  23. Statistics and samples in distributional reinforcement learning. In ICML, pp.  5528–5536. PMLR, 2019.
  24. An analysis of quantile temporal-difference learning. arXiv preprint arXiv:2301.04462, 2023.
  25. Malcolm Strens. A Bayesian framework for reinforcement learning. In ICML, volume 2000, pp.  943–950, 2000.
  26. Reinforcement Learning: An Introduction. MIT press, 2018.
  27. Bayesian reinforcement learning. Reinforcement Learning: State-of-the-Art, pp.  359–386, 2012.
  28. Expectile and quantile regression—david and goliath? Statistical Modelling, 15(5):433–456, 2015.
  29. Fully parameterized quantile function for distributional reinforcement learning. NeurIPS, 32, 2019.
  30. Asymmetric least squares regression estimation: A nonparametric approach. Journal of Nonparametric Statistics, 6(2-3):273–292, 1996.
  31. Moisej A. Zaretsky. On one theorem on absolutely continuous functions (russian),. Doklady Rossiiskoi Akademii Nauk, 1925.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube