Emergent Mind

High-Dimensional Geometric Streaming for Nearly Low Rank Data

(2406.02910)
Published Jun 5, 2024 in cs.DS

Abstract

We study streaming algorithms for the $\ellp$ subspace approximation problem. Given points $a1, \ldots, an$ as an insertion-only stream and a rank parameter $k$, the $\ellp$ subspace approximation problem is to find a $k$-dimensional subspace $V$ such that $(\sum{i=1}n d(ai, V)p){1/p}$ is minimized, where $d(a, V)$ denotes the Euclidean distance between $a$ and $V$ defined as $\min{v \in V}|{a - v}|{\infty}$. When $p = \infty$, we need to find a subspace $V$ that minimizes $\maxi d(ai, V)$. For $\ell{\infty}$ subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a $\text{poly}(k, \log n)$ approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For $\ellp$ subspace approximation, we show that suitably scaling the points and then using our $\ell_{\infty}$ coreset construction, we can compute a $\text{poly}(k, \log n)$ approximation. Our algorithms are easy to implement and run very fast on large datasets. We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, and volume estimation.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.