Papers
Topics
Authors
Recent
2000 character limit reached

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK (2007.04596v1)

Published 9 Jul 2020 in cs.LG, math.OC, and stat.ML

Abstract: We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\in\mathbb{R}d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f{\star}(x) = a{\top}|W{\star}x|$, where $a\in\mathbb{R}d$ is a nonnegative vector and $W{\star} \in\mathbb{R}{d\times d}$ is an orthonormal matrix. We show that an over-parametrized two-layer neural network with ReLU activation, trained by gradient descent from random initialization, can provably learn the ground truth network with population loss at most $o(1/d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$, has population loss at least $\Omega(1 / d)$.

Citations (26)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.