Emergent Mind

$\ell_p$ Testing and Learning of Discrete Distributions

(1412.2314)
Published Dec 7, 2014 in cs.DS , cs.LG , math.ST , and stat.TH

Abstract

The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general $\ellp$ metrics. The intuitions and results often contrast with the classic $\ell1$ case. For $p > 1$, we can learn and test with a number of samples that is independent of the support size of the distribution: With an $\ellp$ tolerance $\epsilon$, $O(\max{ \sqrt{1/\epsilonq}, 1/\epsilon2 })$ samples suffice for testing uniformity and $O(\max{ 1/\epsilonq, 1/\epsilon2})$ samples suffice for learning, where $q=p/(p-1)$ is the conjugate of $p$. As this parallels the intuition that $O(\sqrt{n})$ and $O(n)$ samples suffice for the $\ell1$ case, it seems that $1/\epsilonq$ acts as an upper bound on the "apparent" support size. For some $\ellp$ metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if $p > \frac{4}{3}$. The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all $1 \leq p \leq 2$. Another algorithm gives order-optimal sample complexity for $\ell{\infty}$ uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all $\ell_p$ metrics. The author thanks Cl\'{e}ment Canonne for discussions and contributions to this work.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.