Risk Bounds for Mixture Density Estimation on Compact Domains via the $h$-Lifted Kullback--Leibler Divergence (2404.12586v2)
Abstract: We consider the problem of estimating probability density functions based on sample data, using a finite mixture of densities from some component class. To this end, we introduce the $h$-lifted Kullback--Leibler (KL) divergence as a generalization of the standard KL divergence and a criterion for conducting risk minimization. Under a compact support assumption, we prove an $\mathcal{O}(1/{\sqrt{n}})$ bound on the expected estimation error when using the $h$-lifted KL divergence, which extends the results of Rakhlin et al. (2005, ESAIM: Probability and Statistics, Vol. 9) and Li and Barron (1999, Advances in Neural Information ProcessingSystems, Vol. 12) to permit the risk bounding of density functions that are not strictly positive. We develop a procedure for the computation of the corresponding maximum $h$-lifted likelihood estimators ($h$-MLLEs) using the Majorization-Maximization framework and provide experimental results in support of our theoretical bounds.
- Shun-ichi Amari. Information Geometry and Its Applications, volume 194. Springer, New York, 2016.
- Takeshi Amemiya. Advanced econometrics. Harvard university press, 1985.
- Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6(9), 2005.
- Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998.
- Statistical Inference: The Minimum Distance Approach. CRC press, Boca Raton, 2011.
- Model complexity, goodness of fit and diminishing returns. Advances in Neural Information Processing Systems, 13, 2000.
- Imre Csiszár. Generalized projections for non-negative functions. In Proceedings of 1995 IEEE International Symposium on Information Theory, page 6. IEEE, 1995.
- Optimal kullback–leibler aggregation in mixture density estimation by maximum likelihood. Mathematical Statistics and Learning, 1(1):1–35, 2018.
- Convex optimization on banach spaces. Foundations of Computational Mathematics, 16(2):369–394, 2016.
- Variational learning for finite dirichlet mixture models and applications. IEEE transactions on neural networks and learning systems, 23:762–774, 2012.
- Maximum lq-likelihood method. Annals of Statistics, 38:573–583, 2010.
- Functional bregman divergence and bayesian estimation of distributions. IEEE Transactions on Information Theory, 54(11):5130–5139, 2008.
- Robust estimation in the normal mixture model. Journal of Statistical Planning and Inference, 136(11):3989–4011, 2006.
- Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences, 249:124–131, 2013.
- Christophe Giraud. Introduction to high-dimensional statistics. CRC Press, 2021.
- Uffe Haagerup. The best constants in the khintchine inequality. Studia Mathematica, 70(3):231–283, 1981.
- A tutorial on mm algorithms. The American Statistician, 58(1):30–37, 2004.
- Jussi S Klemelä. Density estimation with stagewise optimization of the empirical risk. Machine Learning, 67:169–195, 2007.
- Jussi S Klemelä. Smoothing of multivariate data: density estimation and visualization. John Wiley & Sons, 2009.
- Generalized Dirichlet-process-means for f-separable distortion measures. Neurocomputing, 458:667–689, 2021. ISSN 0925-2312.
- Unbiased Estimating Equation and Latent Bias under f-Separable Bregman Distortion Measures. IEEE Transactions on Information Theory, 2024.
- Vladimir Koltchinskii. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: École D’Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Mathematics. Springer, 2011. ISBN 9783642221460.
- Rademacher processes and bounding the risk of function learning. arXiv: Probability, pages 443–457, 2004.
- Michael R. Kosorok. Introduction to Empirical Processes and Semiparametric Inference. Springer Series in Statistics. Springer New York, 2007. ISBN 9780387749785.
- Kenneth Lange. MM optimization algorithms. SIAM, 2016.
- Mixture Density Estimation. In S. Solla, T. Leen, and K. Müller, editors, Advances in Neural Information Processing Systems, volume 12. MIT Press, 1999.
- Pascal Massart. Concentration inequalities and model selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003. Springer, 2007.
- A non asymptotic penalized criterion for gaussian mixture model selection. ESAIM: Probability and Statistics, 15:41–68, 2011.
- Adaptive density estimation for clustering with gaussian mixtures. ESAIM: Probability and Statistics, 17:698–724, 2013.
- Colin McDiarmid. On the method of bounded differences. In J.Editor Siemons, editor, Surveys in Combinatorics, 1989: Invited Papers at the Twelfth British Combinatorial Conference, London Mathematical Society Lecture Note Series, pages 148–188. Cambridge University Press, 1989.
- Colin McDiarmid. Concentration. In Michel Habib, Colin McDiarmid, Jorge Ramirez-Alfonsin, and Bruce Reed, editors, Probabilistic Methods for Algorithmic Discrete Mathematics, pages 195–248. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998. ISBN 978-3-662-12788-9.
- Finite Mixture Models. John Wiley & Sons, 2004.
- Density estimation through convex combinations of densities: Approximation and estimation bounds. Neural Networks, 10:99–109, 02 1997.
- Density estimation with minimization of U-divergence. Machine Learning, 90(1):29–57, January 2013.
- Tin Lok James Ng and Kwok-Kun Kwong. Universal approximation on the hypersphere. Communications in Statistics-Theory and Methods, 51:8694–8704, 2022.
- Hien D Nguyen. An introduction to majorization-minimization algorithms for machine learning and statistical estimation. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(2):e1198, 2017.
- An online minorization-maximization algorithm. In 17th Conference of the International Federation of Classification Societies, 2022a.
- Approximation by finite mixtures of continuous density functions that vanish at infinity. Cogent Mathematics & Statistics, 7:1750861, 2020.
- Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces. Communications in Statistics - Theory and Methods, pages 1–12, 2022b.
- A non-asymptotic approach for model selection via penalization in high-dimensional mixture of experts models. Electronic Journal of Statistics, 16(2):4742–4822, 2022c.
- H Papageorgiou and Katerina M David. On countable mixtures of bivariate binomial distributions. Biometrical journal, 36(5):581–601, 1994.
- Leandro Pardo. Statistical Inference Based on Divergence Measures. CRC Press, Boca Raton, 2006.
- Fitting mixtures of kent distributions to aid in joint set identification. Journal of the American Statistical Association, 96:56–63, 2001.
- Maximum Lqsubscript𝐿𝑞L_{q}italic_L start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-likelihood estimation via the expectation-maximization algorithm: a robust estimation of mixture models. Journal of the American Statistical Association, 108(503):914–928, 2013.
- Risk bounds for mixture density estimation. ESAIM: PS, 9:220–229, 2005.
- A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23(2):1126–1153, 2013.
- Gunter Ritter. Robust cluster analysis and variable selection. CRC Press, 2014.
- Ralph Tyrrell Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
- Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- On bregman distances and divergences of probability measures. IEEE Transactions on Information Theory, 58(3):1277–1288, 2012.
- Vladimir N Temlyakov. Convergence and rate of convergence of some greedy algorithms in convex optimization. Proceedings of the Steklov Institute of Mathematics, 293:325–337, 2016.
- Sara van de Geer. Estimation and Testing Under Sparsity: École d’Été de Probabilités de Saint-Flour XLV – 2015. Lecture Notes in Mathematics. Springer International Publishing, 2016. ISBN 9783319327747.
- Aad W van der Vaart and Jon A Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, 1996. ISBN 9780387946405.
- Efficient greedy learning of gaussian mixture models. Neural computation, 15(2):469–485, 2003.
- A greedy em algorithm for gaussian mixture learning. Neural processing letters, 15:77–87, 2002.
- Halbert White. Maximum likelihood estimation of misspecified models. Econometrica, pages 1–25, 1982.
- The mm alternative to em. Statistical Science, 25(4):492–505, 2010.
- Tong Zhang. Sequential greedy approximation for certain convex optimization problems. IEEE Transactions on Information Theory, 49(3):682–691, 2003.