$\ell_p$ Testing and Learning of Discrete Distributions (1412.2314v4)
Abstract: The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general $\ell_p$ metrics. The intuitions and results often contrast with the classic $\ell_1$ case. For $p > 1$, we can learn and test with a number of samples that is independent of the support size of the distribution: With an $\ell_p$ tolerance $\epsilon$, $O(\max{ \sqrt{1/\epsilonq}, 1/\epsilon2 })$ samples suffice for testing uniformity and $O(\max{ 1/\epsilonq, 1/\epsilon2})$ samples suffice for learning, where $q=p/(p-1)$ is the conjugate of $p$. As this parallels the intuition that $O(\sqrt{n})$ and $O(n)$ samples suffice for the $\ell_1$ case, it seems that $1/\epsilonq$ acts as an upper bound on the "apparent" support size. For some $\ell_p$ metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if $p > \frac{4}{3}$. The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all $1 \leq p \leq 2$. Another algorithm gives order-optimal sample complexity for $\ell_{\infty}$ uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all $\ell_p$ metrics. The author thanks Cl\'{e}ment Canonne for discussions and contributions to this work.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.