2000 character limit reached
On non-approximability of zero loss global ${\mathcal L}^2$ minimizers by gradient descent in Deep Learning (2311.07065v3)
Published 13 Nov 2023 in cs.LG, cs.AI, math-ph, math.MP, math.OC, and stat.ML
Abstract: We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that in underparametrized DL networks, zero loss minimization can generically not be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [Chen-Munoz Ewald 2023, 2024], or for gradient descent Chen 2025.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.