Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm (2310.12285v2)

Published 18 Oct 2023 in stat.ME, stat.CO, and stat.ML

Abstract: High-dimensional longitudinal data is increasingly used in a wide range of scientific studies. To properly account for dependence between longitudinal observations, statistical methods for high-dimensional linear mixed models (LMMs) have been developed. However, few packages implementing these high-dimensional LMMs are available in the statistical software R. Additionally, some packages suffer from scalability issues. This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs. We use empirical Bayes estimators of hyperparameters for increased flexibility and an Expectation-Conditional-Minimization (ECM) algorithm for computationally efficient maximum a posteriori probability (MAP) estimation of parameters. The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation. We illustrate Linear Mixed Modeling with PaRtitiOned empirical Bayes ECM (LMM-PROBE) in simulation studies evaluating fixed and random effects estimation along with computation time. A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time. Supplementary materials are available online.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Banchereau, R., Hong, S., Cantarel, B., Baldwin, N., Baisch, J., Edens, M., Cepika, A.-M., Acs, P., Turner, J., Anguiano, E., Vinod, P., Khan, S., Obermoser, G., Blankenship, D., Wakeland, E., Nassi, L., Gotte, A., Punaro, M., Liu, Y.-J., Banchereau, J., Rossello-Urgell, J., Wright, T., and Pascual, V. (2016), “Personalized Immunomonitoring Uncovers Molecular Networks that Stratify Lupus Patients,” Cell, 165, 551–565.
  2. Bhatnagar, S. R., Yang, Y., Lu, T., Schurr, E., Loredo-Osti, J., Forest, M., Oualkacha, K., and Greenwood, C. M. T. (2020), “Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models,” PLOS Genetics, 16, 1–25.
  3. Bondell, H. D., Krishna, A., and Ghosh, S. K. (2010), “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models,” Biometrics, 66, 1069–1077.
  4. Buhlmann, P., Kalisch, M., and Meier, L. (2014), “High-dimensional statistics with a view towards applications in biology.” Annual Review of Statistics and its Applications, 1, 255–278.
  5. Chen, Z., Dunson, D. B., and Chen, Z. (2003), “Random Effects Selection in Linear Mixed Models,” Biometrics, 59, 762–769.
  6. Cole, J. H. (2020), “Multimodality neuroimaging brain-age in UK biobank: relationship to biomedical, lifestyle, and cognitive factors,” Neurobiology of aging, 92, 34–42.
  7. Delattre, M. and Poursat, M. A. (2020), “An iterative algorithm for joint covariate and random effect selection in mixed effects models,” International Journal of Biostatistics, 16, 75–83.
  8. Efron, B. (2008), “Microarrays, Empirical Bayes and the Two-Group Model,” Statistical Science, 23, 1–22.
  9. Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V. (2001), “Empirical Bayes analysis of a microarray experiment,” Journal of the American Statistical Association, 96, 1151–1160.
  10. Fan, J., Han, X., and Gu, W. (2012), “Estimating false discovery proportion under arbitrary covariance dependence,” Journal of the American Statistical Association, 107, 1019–1035.
  11. Fan, J. and Lv, J. (2008), “Sure independence screening for ultrahigh dimensional feature space,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), 70, 849–911.
  12. Fan, Y. and Li, R. (2012), “Variable selection in linear mixed effects models,” The Annals of Statistics, 40, 2043–2068.
  13. Fearn, T. (1975), “A Bayesian Approach to Growth Curves.” Biometrika, 62, 89–100.
  14. Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 33, 1–22.
  15. George, E. I. and McCulloch, R. E. (1997), “Approaches for Bayesian Variable Selection,” Statistica Sinica, 7, 339–373.
  16. Harville, D. A. and Zimmerman, A. G. (1996), “The Posterior Distribution of the Fixed and Random Effects in a Mixed-Effects Linear Model.” Journal of Statistical Computation and Simulation, 54, 211–229.
  17. Ibrahim, J. G., Zhu, H., Garcia, R. I., and Guo, R. (2011), “Fixed and Random Effects Selection in Mixed Effects Models,” Biometrics, 67, 495–503.
  18. Ishwaran, H. and Rao, J. S. (2005), “Spike and slab variable selection: frequentist and Bayesian strategies,” The Annals of Statistics, 33, 730–773.
  19. Jiang, W., Bogdan, M., Josse, J., Majewski, S., Miasojedow, B., Rockova, V., and Group, T. (2022), “Adaptive Bayesian SLOPE: Model Selection with Incomplete Data.” Journal of Computational and Graphical Statistics, 31, 113–137.
  20. Komarek, A. and Lesaffre, E. (2008), “Generalized linear mixed model with a penalized Gaussian mixture as a random effects distribution,” Computational Statistics and Data Analysis, 52, 3441–3458.
  21. Lange, N., Carlin, B. P., and Gelfand, A. E. (1992), “Hierarchical Bayes Models for the Progression of HIV Infection Using Longitudinal CD4 T-Cell Numbers.” Journal of the American Statistical Association, 87, 615–626.
  22. Li, S., Cai, T. T., and Li, H. (2021), “Inference for High-Dimensional Linear Mixed-Effects Models: A Quasi-Likelihood Approach,” Journal of the American Statistical Association, 0, 1–33.
  23. Lindley, D. and Smith, A. (1972), “Bayes Estimates for the Linear Model.” Journal of the Royal Statistical Society: Series B (Methodological), 34, 1–18.
  24. Liu, C., Rubin, D. B., and Wu, Y. N. (1998), “Parameter expansion to accelerate EM: The PX-EM algorithm,” Biometrika, 85, 755–770.
  25. McLain, A. C., Zgodic, A., and Bondell, H. (2022), “Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm.” https://arxiv.org/abs/2209.08139.
  26. Meng, X.-L. and Rubin, D. B. (1992), “Recent extensions to the EM algorithm,” in Bayesian statistics, Oxford Univ. Press, New York, pp. 307–320.
  27. — (1993), “Maximum likelihood estimation via the ECM algorithm: A general framework,” Biometrika, 80, 267–278.
  28. Minka, T. and Lafferty, J. (2002), “Expectation-propagation for the generative aspect model,” in Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pp. 352–359.
  29. Mitchell, T. J. and Beauchamp, J. J. (1988), “Bayesian variable selection in linear regression,” Journal of the American Statistical Association, 83, 1023–1032.
  30. Peng, H. and Lu, Y. (2012), “Model selection in linear mixed effect models,” Journal of Multivariate Analysis, 109, 109–129.
  31. Rakitsch, B., Lippert, C., Stegle, O., and Borgwardt, K. (2012), “A Lasso multi-marker mixed model for association mapping with population structure correction,” Bioinformatics, 29, 206–214.
  32. Reisetter, A. C. and Breheny, P. (2021), “Penalized linear mixed models for structured genetic data,” Genetic Epidemiology, 45, 427–444.
  33. Rohart, F., San Cristobal, M., and Laurent, B. (2014), “Selection of fixed effects in high dimensional linear mixed models using a multicycle ECM algorithm,” Computational Statistics and Data Analysis, 80, 209–222.
  34. Saragosa-Harris, N. M., Chaku, N., MacSweeney, N., Williamson, V. G., Scheuplein, M., Feola, B., Cardenas-Iniguez, C., Demir-Lira, E., McNeilly, E. A., Huffman, L. G., et al. (2022), “A practical guide for researchers and reviewers using the ABCD Study and other large longitudinal datasets,” Developmental cognitive neuroscience, 55, 101115.
  35. Schelldorfer, J., Buhlmann, P., and De Geer, S. V. (2011), “Estimation for High-Dimensional Linear Mixed-Effects Models Using l1-Penalization,” Scandinavian Journal of Statistics, 38, 197–214.
  36. Schlather, M., Malinowski, A., Menck, P. J., Oesting, M., and Strokorb, K. (2015), “Analysis, Simulation and Prediction of Multivariate Random Fields with Package RandomFields,” Journal of Statistical Software, 63, 1–25.
  37. Storey, J. D. (2007), “The optimal discovery procedure: a new approach to simultaneous significance testing,” Journal of the Royal Statistical Society: Series B (Methodological), 69, 347–368.
  38. Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach,” Journal of the Royal Statistical Society: Series B (Methodological), 66, 187–205.
  39. Sun, W. and Cai, T. T. (2007), “Oracle and adaptive compound decision rules for false discovery rate control,” Journal of the American Statistical Association, 102, 901–912.
  40. Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288.
  41. Varadhan, R. and Roland, C. (2008), “Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm.” Scandinavian Journal of Statistics, 35, 335–353.
  42. Vehtari, A., Gelman, A., Sivula, T., Jylänki, P., Tran, D., Sahai, S., Blomstedt, P., Cunningham, J. P., Schiminovich, D., and Robert, C. P. (2020), “Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data,” Journal of Machine Learning Research, 21, 1–53.
  43. Wang, L., Zhou, J., and Qu, A. (2012), “Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis,” Biometrics, 68, 353–360.
  44. Zgodic, A. and McLain, A. C. (2023), “Performing high-dimensional linear mixed modeling with LMM-PROBE,” https://github.com/anjazgodic/lmmprobe.

Summary

We haven't generated a summary for this paper yet.