Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Validation for Training and Testing Co-occurrence Network Inference Algorithms (2309.15225v1)

Published 26 Sep 2023 in cs.LG and q-bio.QM

Abstract: Microorganisms are found in almost every environment, including the soil, water, air, and inside other organisms, like animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. A lot of research has gone into studying microbial communities in various environments and how their interactions and relationships can provide insights into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our empirical study shows that the proposed method is useful for hyper-parameter selection (training) and comparing the quality of the inferred networks between different algorithms (testing).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Strengthening Insights in Microbial Ecological Networks from Theory to Applications. mSystems. 2019;4(3):e00124–19. doi:10.1128/mSystems.00124-19.
  2. Defining the human microbiome. Nutrition Reviews. 2012;70(suppl_1):S38–S44. doi:10.1111/j.1753-4887.2012.00493.x.
  3. Belkaid Y, Hand T. Role of the Microbiota in Immunity and Inflammation. Cell. 2014;157(1):121–141. doi:https://doi.org/10.1016/j.cell.2014.03.011.
  4. Blaser MJ. Antibiotic use and its consequences for the normal microbiome. Science. 2016;352(6285):544–545. doi:10.1126/science.aad9358.
  5. The healthy human microbiome. Genome Medicine. 2016;8(1):51. doi:10.1186/s13073-016-0307-y.
  6. CCLasso: correlation inference for compositional data through Lasso. Bioinformatics. 2015;31(19):3172–3180. doi:10.1093/bioinformatics/btv349.
  7. Investigating microbial co-occurrence patterns based on metagenomic compositional data. Bioinformatics. 2015;31(20):3322–3329. doi:10.1093/bioinformatics/btv364.
  8. Sparse and Compositionally Robust Inference of Microbial Ecological Networks. PLOS Computational Biology. 2015;11(5):1–25. doi:10.1371/journal.pcbi.1004226.
  9. gCoda: Conditional Dependence Network Inference for Compositional Data. Journal of Computational Biology. 2017;24(7):699–708. doi:10.1089/cmb.2017.0054.
  10. MDiNE: a model to estimate differential co-occurrence networks in microbiome studies. Bioinformatics. 2019;36(6):1840–1847. doi:10.1093/bioinformatics/btz824.
  11. Compositional zero-inflated network estimation for microbiome data. BMC Bioinformatics. 2020;21(21):581. doi:10.1186/s12859-020-03911-w.
  12. The Earth Microbiome project: successes and aspirations. BMC Biology. 2014;12(1):69. doi:10.1186/s12915-014-0069-1.
  13. Strengths and Limitations of 16S rRNA Gene Amplicon Sequencing in Revealing Temporal Microbial Community Dynamics. PLOS ONE. 2014;9(4):1–12. doi:10.1371/journal.pone.0093827.
  14. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Research. 2013;42(D1):D633–D642. doi:10.1093/nar/gkt1244.
  15. Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB. Applied and Environmental Microbiology. 2006;72(7):5069–5072. doi:10.1128/AEM.03006-05.
  16. Inference of Environmental Factor-Microbe and Microbe-Microbe Associations from Metagenomic Data Using a Hierarchical Bayesian Statistical Model. Cell Systems. 2017;4(1):129–137.e5. doi:https://doi.org/10.1016/j.cels.2016.12.012.
  17. McMurdie PJ, Holmes S. phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data. PLOS ONE. 2013;8(4):1–11. doi:10.1371/journal.pone.0061217.
  18. Tavakoli S, Yooseph S. Learning a mixture of microbial networks using minorization–maximization. Bioinformatics. 2019;35(14):i23–i30. doi:10.1093/bioinformatics/btz370.
  19. Friedman J, Alm EJ. Inferring Correlation Networks from Genomic Survey Data. PLOS Computational Biology. 2012;8(9):1–11. doi:10.1371/journal.pcbi.1002687.
  20. Molecular ecological network analyses. BMC Bioinformatics. 2012;13(1):113. doi:10.1186/1471-2105-13-113.
  21. Faust K, Raes J. CoNet app: inference of biological association networks using Cytoscape [version 2; peer review: 2 approved]. F1000Research. 2016;5(1519). doi:10.12688/f1000research.9050.2.
  22. HARMONIES: A Hybrid Approach for Microbiome Networks Inference via Exploiting Sparsity. Frontiers in Genetics. 2020;11. doi:10.3389/fgene.2020.00445.
  23. The Poisson-Lognormal Model as a Versatile Framework for the Joint Analysis of Species Abundances. Frontiers in Ecology and Evolution. 2021;9. doi:10.3389/fevo.2021.588292.
  24. Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies. PLOS ONE. 2011;6(12):1–14. doi:10.1371/journal.pone.0027310.
  25. Genome-wide mapping of gene-microbiota interactions in susceptibility to autoimmune skin blistering. Nature communications. 2013;4:2462. doi:10.1038/ncomms3462.
  26. American Gut: an Open Platform for Citizen Science Microbiome Research. mSystems. 2018;3(3):e00031–18. doi:10.1128/mSystems.00031-18.
  27. Determinants of community structure in the global plankton interactome. Science. 2015;348(6237):1262073. doi:10.1126/science.1262073.
  28. Newman MEJ. Mixing patterns in networks. Phys Rev E. 2003;67:026126. doi:10.1103/PhysRevE.67.026126.
  29. A Novel Sparse Compositional Technique Reveals Microbial Perturbations. mSystems. 2019;4(1):e00016–19. doi:10.1128/mSystems.00016-19.
  30. The elements of statistical learning: data mining, inference, and prediction. Springer; 2009. Available from: https://hastie.su.domains/ElemStatLearn/.
  31. Preprocessing for classification of sparse data: Application to trajectory recognition. 2012 IEEE Statistical Signal Processing Workshop, SSP 2012. 2012; p. 37–40. doi:10.1109/SSP.2012.6319709.
  32. Yeo IK, Johnson R. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87. doi:10.1093/biomet/87.4.954.
  33. 2 - Data preprocessing. In: Al-jabery KK, Obafemi-Ajayi T, Olbricht GR, Wunsch II DC, editors. Computational Learning Approaches to Data Analytics in Biomedical Applications. Academic Press; 2020. p. 7–27. Available from: https://www.sciencedirect.com/science/article/pii/B9780128144824000024.
  34. Oravkin E, Rebeschini P. On Optimal Interpolation In Linear Regression; 2021.
  35. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1):267–288. doi:https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.
  36. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.
  37. net M. Gaussian Distribution; Accessed on March 9, 2023. https://www.math.net/gaussian-distribution.
  38. Scott DW. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons; 2015.
  39. HTSSIP: An R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments. PLOS ONE. 2018;13(1):1–8. doi:10.1371/journal.pone.0189616.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Daniel Agyapong (1 paper)
  2. Jeffrey Ryan Propster (1 paper)
  3. Jane Marks (1 paper)
  4. Toby Dylan Hocking (19 papers)

Summary

We haven't generated a summary for this paper yet.