Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints (2106.08597v1)

Published 16 Jun 2021 in stat.ML and cs.LG

Abstract: We consider the problem of estimating a $d$-dimensional $s$-sparse discrete distribution from its samples observed under a $b$-bit communication constraint. The best-known previous result on $\ell_2$ estimation error for this problem is $O\left( \frac{s\log\left( {d}/{s}\right)}{n2b}\right)$. Surprisingly, we show that when sample size $n$ exceeds a minimum threshold $n*(s, d, b)$, we can achieve an $\ell_2$ estimation error of $O\left( \frac{s}{n2b}\right)$. This implies that when $n>n*(s, d, b)$ the convergence rate does not depend on the ambient dimension $d$ and is the same as knowing the support of the distribution beforehand. We next ask the question: ``what is the minimum $n*(s, d, b)$ that allows dimension-free convergence?''. To upper bound $n*(s, d, b)$, we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that $n*(s, d, b) = O\left( \min \left( {d2\log2 d}/{2b}, {s4\log2 d}/{2b}\right) \right)$. Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when $n = \tilde{\Omega}\left({s4\log4 d}/{2b}\right)$. This group testing based scheme is adaptive to the sparsity parameter $s$, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to $n*(s, d, b) = \tilde{O}\left( {s2\log2 d}/{2b} \right)$.

Citations (11)

Summary

We haven't generated a summary for this paper yet.