Parameter estimation for integer-valued Gibbs distributions (1904.03139v6)

Published 5 Apr 2019 in math.PR, cs.CC, and cs.DM

Abstract: A central problem in computational statistics is to convert a procedure for sampling combinatorial from an objects into a procedure for counting those objects, and vice versa. Weconsider sampling problems coming from Gibbs distributions, which are probability distributions of the form $\mu^{\Omega_\beta(\omega)} \propto e^{\beta H(\omega)}$ for $\beta$ in an interval $[\beta_\min, \beta_\max]$ and $H( \omega ) \in {0 } \cup [1, n]$. The partition function is the normalization factor $Z(\beta)=\sum_{\omega \in\Omega}e^{\beta H(\omega)}$. Two important parameters are the log partition ratio $q = \log \tfrac{Z(\beta_\max)}{Z(\beta_\min)}$ and the vector of counts $c_x = |H^{-1}(x)|$. Our first result is an algorithm to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\epsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^{2}{\epsilon^2}} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph. We develop a key subroutine for global estimation of the partition function. Specifically, we produce a data structure to estimate $Z(\beta)$ for \emph{all} values $\beta$, without further samples. Constructing the data structure requires $O(\frac{q \log n}{\epsilon^2})$ samples for general Gibbs distributions and $O(\frac{n² \log n}{\epsilon^2} + n \log q)$ samples for integer-valued distributions. This improves over a prior algorithm of Kolmogorov (2018) which computes the single point estimate $Z(\beta_\max)$ using $\tilde O(\frac{q}{\epsilon^2})$ samples. We also show that this complexity is optimal as a function of $n$ and $q$ up to logarithmic terms.