Emergent Mind

Dictionary Learning for the Almost-Linear Sparsity Regime

Published Oct 19, 2022 in cs.LG , eess.SP , math.PR , and stat.ML


Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}i \in \mathbb{R}{K}$ from samples of the form $\mathbf{y}i = \mathbf{D}\mathbf{x}i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime.

