Estimating Entropy of Distributions in Constant Space (1911.07976v1)
Abstract: We consider the task of estimating the entropy of $k$-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires $O\left(\frac{k \log (1/\varepsilon)2}{\varepsilon3}\right)$ samples and a constant $O(1)$ memory words of space and outputs a $\pm\varepsilon$ estimate of $H(p)$. Without space limitations, the sample complexity has been established as $S(k,\varepsilon)=\Theta\left(\frac k{\varepsilon\log k}+\frac{\log2 k}{\varepsilon2}\right)$, which is sub-linear in the domain size $k$, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in $k$. Our algorithm partitions $[0,1]$ into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates.
- Jayadev Acharya (45 papers)
- Sourbh Bhadane (7 papers)
- Piotr Indyk (66 papers)
- Ziteng Sun (29 papers)