Papers
Topics
Authors
Recent
2000 character limit reached

Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation (2302.13087v2)

Published 25 Feb 2023 in math.OC and cs.LG

Abstract: In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and efficiently compute the GN updates by cheap matrix iterations. Under mild conditions, non-asymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations. In particular, for neural network parameterization with relu activation, GNTD achieves an improved sample complexity of $\tilde{\mathcal{O}}(\varepsilon{-1})$, as opposed to the $\mathcal{\mathcal{O}}(\varepsilon{-2})$ sample complexity of the existing neural TD methods. An $\tilde{\mathcal{O}}(\varepsilon{-1.5})$ sample complexity of GNTD is also established for general smooth function approximations. We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.