Emergent Mind

Sample-Efficient Linear Regression with Self-Selection Bias

(2402.14229)
Published Feb 22, 2024 in math.ST , cs.DS , cs.LG , and stat.TH

Abstract

We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes $m$ i.i.d. samples $(\mathbf{x}{\ell},z{\ell}){\ell=1}m$ where $z{\ell}=\max{i\in [k]}{\mathbf{x}{\ell}T\mathbf{w}i+\eta{i,\ell}}$, but the maximizing index $i{\ell}$ is unobserved. Here, the $\mathbf{x}{\ell}$ are assumed to be $\mathcal{N}(0,In)$ and the noise distribution $\mathbf{\eta}{\ell}\sim \mathcal{D}$ is centered and independent of $\mathbf{x}{\ell}$. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $\mathbf{w}1,\ldots,\mathbf{w}k\in \mathbb{R}n$ up to additive $\ell2$-error $\varepsilon$ with polynomial sample complexity $\tilde{O}(n)\cdot \mathsf{poly}(k,1/\varepsilon)$ and significantly improved time complexity $\mathsf{poly}(n,k,1/\varepsilon)+O(\log(k)/\varepsilon){O(k)}$. When $k=O(1)$, our algorithm runs in $\mathsf{poly}(n,1/\varepsilon)$ time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for $k=2$ and when it is known that $\mathcal{D}=\mathcal{N}(0,I_k)$. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of $k$ than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small $\varepsilon$, and leads to improved algorithms for any $\varepsilon$ by providing a warm start for existing local convergence methods.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.