Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel (2107.12723v2)

Published 27 Jul 2021 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Dominic Richards (11 papers)
  2. Ilja Kuzborskij (23 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.