Convergence of Gradient Descent with Small Initialization for Unregularized Matrix Completion (2402.06756v1)

Published 9 Feb 2024 in cs.LG, math.OC, and stat.ML

Abstract: We study the problem of symmetric matrix completion, where the goal is to reconstruct a positive semidefinite matrix $\rm{X}^\star \in \mathbb{R}^{d\times d}$ of rank-$r$, parameterized by $\rm{U}\rm{U}^{\top}$, from only a subset of its observed entries. We show that the vanilla gradient descent (GD) with small initialization provably converges to the ground truth $\rm{X}^\star$ without requiring any explicit regularization. This convergence result holds true even in the over-parameterized scenario, where the true rank $r$ is unknown and conservatively over-estimated by a search rank $r'\gg r$. The existing results for this problem either require explicit regularization, a sufficiently accurate initial point, or exact knowledge of the true rank $r$. In the over-parameterized regime where $r'\geq r$, we show that, with $\widetilde\Omega(dr^9)$ observations, GD with an initial point $|\rm{U}_0| \leq \epsilon$ converges near-linearly to an $\epsilon$-neighborhood of $\rm{X}^\star$. Consequently, smaller initial points result in increasingly accurate solutions. Surprisingly, neither the convergence rate nor the final accuracy depends on the over-parameterized search rank $r'$, and they are only governed by the true rank $r$. In the exactly-parameterized regime where $r'=r$, we further enhance this result by proving that GD converges at a faster rate to achieve an arbitrarily small accuracy $\epsilon>0$, provided the initial point satisfies $|\rm{U}_0| = O(1/d)$. At the crux of our method lies a novel weakly-coupled leave-one-out analysis, which allows us to establish the global convergence of GD, extending beyond what was previously possible using the classical leave-one-out analysis.

Citations (2)

View on Semantic Scholar

Summary

The paper shows that gradient descent with small initialization converges nearly linearly in unregularized matrix completion even when the rank is overestimated.
It reveals enhanced performance in the exactly-parameterized regime with improved convergence rates and reduced sample complexity.
The study introduces a novel weakly-coupled leave-one-out analysis framework, broadening methods for gradient descent analysis in non-RIP settings.

Convergence Analysis of Gradient Descent for Unregularized Matrix Completion

Introduction to the Problem and Results

Matrix completion is a classical problem that appears recurrently across various domains in machine learning, where the objective is to infer the missing entries of a matrix based on a subset of its observed elements. The paper addresses the convergence of gradient descent (GD) for solving the unregularized matrix completion problem, particularly focusing on symmetric matrices. Notably, the analysis transcends the conventional boundaries by demonstrating convergence in both the over-parameterized setting, where the rank of the ground truth matrix is unknown and potentially over-estimated, and the exactly-parameterized regime.

The key contributions of this research can be encapsulated as follows:

Showcasing the convergence of GD with small initialization in the over-parameterized scenario: It is proved that GD, initiated with a sufficiently small magnitude, converges to the ground truth matrix at a near-linear rate regardless of the overestimation of the rank. This result intriguingly highlights that neither the convergence rate nor the final accuracy is affected by the search rank as long as it is over-estimated, challenging previous assumptions in the literature that this would complicate or hinder convergence.
Highlighting enhanced convergence in the exactly-parameterized regime: In scenarios where the true rank is known (exactly-parameterized), there's an improved convergence rate and a reduction in sample complexity, establishing that a precise initial guess of the rank could significantly benefit the optimization trajectory.
Introducing a novel analytical framework: The investigation introduces a 'weakly-coupled leave-one-out analysis' framework, expanding the utility of traditional analysis methods and allowing for a global convergence analysis of GD in matrix completion settings that do not adhere to the restricted isometry property (RIP).

Problem Setup and Main Results

The research examines the gradient descent algorithm's behavior when applied to the matrix completion problem without any form of explicit regularization. The primary focus lies on symmetric matrix completion, with an emphasis on positive semidefinite matrices of a known rank. Through rigorous analysis, the paper dispels the necessity of explicit regularization or projection steps, previously deemed essential in ensuring the convergence of GD to the ground truth.

Under the established framework, the analysis reveals that with an adequately small initialization and a certain number of observed entries (dictated by the sampling rate), GD is proven to converge to the ground truth matrix within a specified accuracy. This holds true even in situations where the rank of the matrix is over-estimated, a scenario often encountered in practical applications due to unknown rank conditions.

Theoretical Implications and Future Directions

This paper's findings have substantial implications for both theoretical understanding and practical application of gradient descent methods in unregularized matrix completion problems. By challenging and extending the existing theoretical frameworks, this research provides a foundation for exploring similar optimization problems within and beyond matrix completion, potentially influencing future algorithm design and analysis methodologies.

Moreover, the introduction of a novel weakly-coupled leave-one-out analysis technique not only facilitates this paper's convergence proofs but also sets a precedent for analyzing gradient descent in over-parameterized settings more broadly. Looking ahead, this could pave the way for new research avenues in understanding the intrinsic properties of gradient-based optimization, especially in the dominion of over-parameterization, which is increasingly prevalent in modern machine learning models.

In conclusion, this paper provides comprehensive insights into the convergence behavior of gradient descent for unregularized matrix completion tasks, expanding the theoretical understanding and challenging prevailing notions in the field. Future research could further explore the boundaries of these results, investigating their applicability and implications in a wider array of optimization problems and settings.

PDF Markdown

Related Papers

Tweets

https://twitter.com/jianhao_ma/status/1757848559276155313

https://twitter.com/Salar_Fattahi/status/1761454737038123468

https://twitter.com/StatMLPapers/status/1757269336094343559