Emergent Mind

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

(2309.10370)
Published Sep 19, 2023 in cs.LG , cs.AI , math-ph , math.MP , math.OC , and stat.ML

Abstract

In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow neural networks through the explicit construction of upper bounds, without any use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider shallow neural networks with one hidden layer, a ReLU activation function, an ${\mathcal L}2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}M$, output space ${\mathbb R}Q$ with $Q\leq M$, and training input sample size $N>QM$ that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(\deltaP)$ where $\deltaP$ measures the signal to noise ratio of training inputs. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}M$. We comment on the characterization of the global minimum of the cost function in the given context.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.