Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks (2405.04017v1)

Published 7 May 2024 in cs.LG, math.OC, and cs.AI

Abstract: Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $\tilde{\mathcal{O}}(\epsilon{-1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{\mathcal{O}}(\epsilon{-1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{\mathcal{O}}(\epsilon{-2})$ complexity in the existing literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Learning and generalization in overparameterized neural networks, going beyond two layers. Advances in neural information processing systems, 32, 2019a.
  2. A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, pp. 242–252. PMLR, 2019b.
  3. Analysis of a target-based actor-critic algorithm with linear function approximation. In International Conference on Artificial Intelligence and Statistics, pp.  991–1040. PMLR, 2022.
  4. Bertsekas, D. Dynamic programming and optimal control: Volume I, volume 1. Athena scientific, 2012.
  5. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp.  1691–1692. PMLR, 2018.
  6. Borkar, V. S. Stochastic approximation: a dynamical systems viewpoint, volume 48. Springer, 2009.
  7. Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence, volume 17, pp.  1021–1026. Citeseer, 2001.
  8. Boyan, J. A. Technical update: Least-squares temporal difference learning. Machine learning, 49(2):233–246, 2002.
  9. Linear least-squares algorithms for temporal difference learning. Machine learning, 22(1):33–57, 1996.
  10. Geometric insights into the convergence of nonlinear td learning. arXiv preprint arXiv:1905.12185, 2019.
  11. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  12. Neural temporal difference and q learning provably converge to global optima. Mathematics of Operations Research, 2023.
  13. Generalization bounds of stochastic gradient descent for wide and deep neural networks. Advances in neural information processing systems, 32, 2019.
  14. Generalization error bounds of gradient descent for learning over-parameterized deep relu networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  3349–3356, 2020.
  15. Sample complexity and overparameterization bounds for temporal difference learning with neural network approximation. IEEE Transactions on Automatic Control, 2023.
  16. Finite sample analyses for td (0) with function approximation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  17. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  18. Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pp. 1675–1685. PMLR, 2019.
  19. Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054, 2018.
  20. A theoretical analysis of deep q-learning. In Learning for dynamics and control, pp.  486–489. PMLR, 2020.
  21. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
  22. Godfrey, L. B. An evaluation of parametric activation functions for deep learning. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp.  3006–3011. IEEE, 2019.
  23. Convergence of stochastic iterative dynamic programming algorithms. Advances in neural information processing systems, 6, 1993.
  24. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
  25. Provably efficient gauss-newton temporal difference learning method with function approximation. arXiv preprint arXiv:2302.13087, 2023.
  26. Actor-critic algorithms. Advances in neural information processing systems, 12, 1999.
  27. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning, pp. 5774–5783. PMLR, 2021.
  28. Finite-sample analysis of lstd. In ICML-27th International Conference on Machine Learning, pp.  615–622, 2010.
  29. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  30. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  31. Littman, M. L. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, pp.  157–163. Elsevier, 1994.
  32. Finite-sample analysis of proximal gradient td algorithms. arXiv preprint arXiv:2006.14364, 2020a.
  33. On the linearity of large non-linear models: when and why the tangent kernel is constant. Advances in Neural Information Processing Systems, 33:15954–15964, 2020b.
  34. Convergent temporal-difference learning with arbitrary smooth function approximation. Advances in neural information processing systems, 22, 2009.
  35. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  36. On the existence of fixed points for q-learning and sarsa in partially observable domains. In ICML, pp.  490–497, 2002.
  37. Actor-critic fictitious play in simultaneous move multistage games. In International Conference on Artificial Intelligence and Statistics, pp.  919–928. PMLR, 2018.
  38. Fast lstd using stochastic approximation: Finite time analysis and application to traffic control. In Joint European conference on machine learning and knowledge discovery in databases, pp.  66–81. Springer, 2014.
  39. Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015.
  40. Finite-time analysis of adaptive temporal difference learning with deep neural networks. Advances in Neural Information Processing Systems, 35:19592–19604, 2022.
  41. Sutton, R. S. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
  42. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  43. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th annual international conference on machine learning, pp.  993–1000, 2009a.
  44. A convergent o (n) algorithm for off-policy temporal-difference learning with linear function approximation. Advances in neural information processing systems, 21(21):1609–1616, 2009b.
  45. On the rate of convergence and error bounds for lstd (λ𝜆\lambdaitalic_λ). In International Conference on Machine Learning, pp. 1521–1529. PMLR, 2015.
  46. Tesauro, G. et al. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995.
  47. On the performance of temporal difference learning with neural networks. In The Eleventh International Conference on Learning Representations, 2022.
  48. Convergent tree backup and retrace with function approximation. In International Conference on Machine Learning, pp. 4955–4964. PMLR, 2018.
  49. Analysis of temporal-diffference learning with function approximation. Advances in neural information processing systems, 9, 1996.
  50. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  51. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  52. A finite-time analysis of q-learning with neural network function approximation. In International Conference on Machine Learning, pp. 10555–10565. PMLR, 2020.
  53. Finite-sample analysis for sarsa with linear function approximation. Advances in neural information processing systems, 32, 2019.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets