Analysis of Estimating the Bayes Rule for Gaussian Mixture Models with a Specified Missing-Data Mechanism (2210.13785v2)
Abstract: Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.
- Statistics and Computing pp. 1–12 (2020)
- Econometrics and Statistics 26, 124–138 (2023)
- Biometrika 50(1/2), 17–21 (1963)
- In: Proceedings of the eleventh annual conference on Computational learning theory, pp. 92–100 (1998)
- MIT Press, Cambridge, MA, USA. Cited in page (s) 21(1), 2 (2010)
- Journal of Artificial Intelligence Research 23, 331–366 (2005)
- Technometrics 53(4), 406–413 (2011)
- Pattern recognition 42(3), 334–348 (2009)
- Efron, B.: The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association 70(352), 892–898 (1975)
- IEEE Transactions on Pattern Analysis and Machine Intelligence 30(3), 424–437 (2008)
- Gilbert, E.S.: The effect of unel variance-covariance matrices on fisher’s linear discriminant function. Biometrics pp. 505–515 (1969)
- Han, C.P.: Distribution of discriminant function when covariance matrices are proportional. The Annals of Mathematical Statistics 40(3), 979–985 (1969)
- The Canadian Journal of Statistics/La Revue Canadienne de Statistique pp. 261–270 (1982)
- In: Eleventh Annual Conference of the International Speech Communication Association (2010)
- In: Icml, vol. 99, pp. 200–209 (1999)
- Pattern recognition 40(4), 1207–1221 (2007)
- Frontiers in Cellular Neuroscience 17 (2023)
- Journal of Machine learning research 5(Jan), 27–72 (2004)
- Statistics and Computing 24(2), 181–202 (2014)
- Diagnostics 10 (2020)
- arXiv preprint arXiv:2302.13206 (2023)
- Journal of the American Statistical Association 69(346), 555–559 (1974)
- McLachlan, G.J.: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. Journal of the American Statistical Association 70(350), 365–369 (1975)
- McLachlan, G.J.: Some expected values for the error rates of the sample quadratic discriminant function1. Australian Journal of Statistics 17(3), 161–165 (1975)
- McLachlan, G.J.: Estimating the linear discriminant function from initial samples containing a small number of unclassified observations. Journal of the American statistical association 72(358), 403–406 (1977). DOI 10.1080/01621459.1977.10481009
- Statistics in Medicine 8(10), 1291–1300 (1989). DOI 10.1002/sim.4780081012
- Biometrika 102(4), 995–1000 (2015)
- Scientific data 6(1), 1–6 (2019)
- O’Neill, T.J.: Normal discrimination with unclassified observations. Journal of the American Statistical Association 73(364), 821–826 (1978)
- Bioinformatics 22(19), 2388–2395 (2006)
- Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
- Journal of Computational Biology 17(8), 953–967 (2010)
- Advances in neural information processing systems 14 (2001)
- Vapnik, V.: The support vector method of function estimation pp. 55–85 (1998)
- Scientific Reports 11, 17611 (2021). DOI 10.1038/s41598-021-96745-2
- International Journal of Molecular Sciences 24 (2023)
- Advances in neural information processing systems 16 (2003)
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.