Emergent Mind
Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning
(2311.07065)
Published Nov 13, 2023
in
cs.LG
,
cs.AI
,
math-ph
,
math.MP
,
math.OC
,
and
stat.ML
Abstract
We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Generate a summary of this paper on our Pro plan:
We ran into a problem analyzing this paper.