Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stability for Inference with Persistent Homology Rank Functions (2307.02904v2)

Published 6 Jul 2023 in math.AT and stat.ML

Abstract: Persistent homology barcodes and diagrams are a cornerstone of topological data analysis that capture the "shape" of a wide range of complex data structures, such as point clouds, networks, and functions. However, their use in statistical settings is challenging due to their complex geometric structure. In this paper, we revisit the persistent homology rank function, which is mathematically equivalent to a barcode and persistence diagram, as a tool for statistics and machine learning. Rank functions, being functions, enable the direct application of the statistical theory of functional data analysis (FDA)-a domain of statistics adapted for data in the form of functions. A key challenge they present over barcodes in practice, however, is their lack of stability-a property that is crucial to validate their use as a faithful representation of the data and therefore a viable summary statistic. In this paper, we fill this gap by deriving two stability results for persistent homology rank functions under a suitable metric for FDA integration. We then study the performance of rank functions in functional inferential statistics and machine learning on real data applications, in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing non-persistence-based approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. Persistence images: A stable vector representation of persistent homology. The Journal of Machine Learning Research 18(1), 218–252.
  2. Persistent homology for random fields and complexes, Volume Volume 6 of Collections, pp.  124–143. Beachwood, Ohio, USA: Institute of Mathematical Statistics.
  3. Crackle: The homology of noise. Discrete & Computational Geometry 52, 680–704.
  4. The lung image database consortium (lidc) and image database resource initiative (idri): A completed reference database of lung nodules on ct scans. Medical Physics 38(2), 915–931.
  5. Data from lidc-idri.
  6. On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition 107, 107509.
  7. Induced matchings and the algebraic stability of persistence barcodes. Journal of Computational Geometry 6(2), 162–191. Number: 2.
  8. Multidimensional Size Functions for Shape Comparison. Journal of Mathematical Imaging and Vision 32(2), 161–179.
  9. Describing shapes by geometrical-topological properties of real functions. ACM Comput. Surv. 40(4), 12:1–12:87.
  10. On the Stability of Interval Decomposable Persistence Modules. Discrete & Computational Geometry 66(1), 92–121.
  11. Computing the Interleaving Distance is NP-Hard. Foundations of Computational Mathematics 20(5), 1237–1271.
  12. $\ell^p$-Distances on Multiparameter Persistence Modules. arXiv:2106.13589 [cs, math].
  13. Robust Statistics, Hypothesis Testing, and Confidence Intervals for Persistent Homology on Metric Measure Spaces. Foundations of Computational Mathematics 14(4), 745–789.
  14. Topology of random geometric complexes: a survey. Journal of Applied and Computational Topology, 1–34.
  15. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, New York, NY, USA, pp.  144–152. Association for Computing Machinery.
  16. Decomposition of persistence modules. Proceedings of the American Mathematical Society 148(11), 4581–4596.
  17. Signed Barcodes for Multi-Parameter Persistence via Rank Decompositions. In X. Goaoc and M. Kerber (Eds.), 38th International Symposium on Computational Geometry (SoCG 2022), Volume 224 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, pp. 19:1–19:18. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
  18. Atom-specific persistent homology and its application to protein flexibility analysis. Computational and Mathematical Biophysics 8(1), 1–35. Publisher: De Gruyter Open Access.
  19. Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research 16(1), 77–102.
  20. Metrics for generalized persistence modules. Foundations of Computational Mathematics 15(6), 1501–1531. arXiv:1312.3829 [cs, math].
  21. Exact weights, path metrics, and algebraic Wasserstein distances. Journal of Applied and Computational Topology 7(2), 185–219.
  22. Categorification of persistent homology. Discrete & Computational Geometry 51(3), 600–627.
  23. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Research 49(D1), D437–D451.
  24. Size Functions from a Categorical Viewpoint. Acta Applicandae Mathematica 67(3), 225–235.
  25. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLOS Computational Biology 14(1), e1005929. Publisher: Public Library of Science.
  26. Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology. Bioinformatics 33(22), 3549–3557.
  27. k𝑘kitalic_k-means clustering for persistent homology. arXiv preprint arXiv:2210.10003.
  28. Approximating Persistent Homology for Large Datasets. arXiv:2204.09155 [cs, math, stat].
  29. A Geometric Condition for Uniqueness of Fréchet Means of Persistence Diagrams. arXiv preprint arXiv:2207.03943.
  30. Zigzag persistence. Foundations of computational mathematics 10(4), 367–405.
  31. The theory of multidimensional persistence. Discrete and Computational Geometry 42, 71–93.
  32. The Theory of Multidimensional Persistence. Discrete & Computational Geometry 42(1), 71–93.
  33. Betti numbers in multidimensional persistent homology are stable functions. Mathematical Methods in the Applied Sciences 36(12), 1543–1557.
  34. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry, SCG ’09, New York, NY, USA, pp.  237–246. Association for Computing Machinery.
  35. The Structure and Stability of Persistence Modules. SpringerBriefs in Mathematics. Cham: Springer International Publishing.
  36. Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the thirtieth annual symposium on Computational geometry, pp.  474. ACM.
  37. Convergence rates for persistence diagram estimation in topological data analysis. Journal of Machine Learning Research 16, 3603–3635.
  38. Stability of Persistence Diagrams. Discrete & Computational Geometry 37(1), 103–120.
  39. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27.
  40. Cox, D. D. and J. S. Lee (2008). Pointwise testing with functional data using the westfall–young randomization method. Biometrika 95, 621–634.
  41. Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis. Journal of the American Statistical Association 115(531), 1139–1150.
  42. Decomposition of pointwise finite-dimensional persistence modules. Journal of Algebra and Its Applications 14(05), 1550066. Publisher: World Scientific Publishing Co.
  43. Optimal matching between reduced size functions. DISMI, Universit‘a di Modena e Reggio Emilia 35.
  44. Using matching distance in size theory: A survey. International Journal of Imaging Systems and Technology 16(5), 154–161.
  45. Natural Pseudo-Distances and Optimal Matching between Reduced Size Functions. Acta Applicandae Mathematicae 109(2), 527–554.
  46. Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. Journal of multivariate analysis 12(1), 136–154.
  47. Recognition of Occluded Shapes Using Size Functions. In P. Foggia, C. Sansone, and M. Vento (Eds.), Image Analysis and Processing – ICIAP 2009, Berlin, Heidelberg, pp.  642–651. Springer Berlin Heidelberg.
  48. Topological Persistence and Simplification. Discrete & Computational Geometry 28(4), 511–533.
  49. Confidence sets for persistence diagrams. Ann. Statist. 42(6), 2301–2339.
  50. Frosini, P. (1992). Measuring shapes by size functions. In Intelligent Robots and Computer Vision X: Algorithms and Techniques, Volume 1607, pp.  122–134. International Society for Optics and Photonics.
  51. Size theory as a topological tool for computer vision. Pattern Recognition and Image Analysis 9(4), 596–603.
  52. Size Functions and Formal Series. Applicable Algebra in Engineering, Communication and Computing 12(4), 327–349.
  53. Exploring uses of persistent homology for statistical analysis of landmark-based shape data. Journal of Multivariate Analysis 101(9), 2184–2199.
  54. The database of normal rr-intervals of length up to 512 of 41 patients at rest hospitalized due to the episode of acute ischemic stroke.
  55. Gidea, M. (2017). Topological Data Analysis of Critical Transitions in Financial Networks. In E. Shmueli, B. Barzel, and R. Puzis (Eds.), 3rd International Winter School and Conference on Network Science, Springer Proceedings in Complexity, Cham, pp.  47–59. Springer International Publishing.
  56. Persistent homology as a new method of the assessment of heart rate variability. Plos one 16(7), e0253851.
  57. Wavelet-Based Density Estimation for Persistent Homology. arXiv preprint arXiv:2305.08999.
  58. Persistent Topology of Protein Space. In E. Gasparovic, V. Robins, and K. Turner (Eds.), Research in Computational Topology 2, Association for Women in Mathematics Series, pp.  223–244. Cham: Springer International Publishing.
  59. Stratifying Multiparameter Persistent Homology. SIAM Journal on Applied Algebra and Geometry 3(3), 439–471. Publisher: Society for Industrial and Applied Mathematics.
  60. Deep learning with topological signatures. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems 30, pp.  1634–1644. Curran Associates, Inc.
  61. Exact Computation of the Matching Distance on 2-Parameter Persistence Modules. pp.  15 pages. Artwork Size: 15 pages Medium: application/pdf Publisher: Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany Version Number: 1.0.
  62. Fast Minimal Presentations of Bi-graded Persistence Modules. In 2021 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), Proceedings, pp.  207–220. Society for Industrial and Applied Mathematics.
  63. Generalized persistence diagrams for persistence modules over posets. Journal of Applied and Computational Topology 5(4), 533–581.
  64. Landi, C. (2018). The Rank Invariant Stability via Interleavings, pp.  1–10. Cham: Springer International Publishing.
  65. New pseudodistances for the size function space. In Vision Geometry VI, Volume 3168, pp.  52–61. International Society for Optics and Photonics.
  66. Size functions as complete invariants for image recognition. In Vision Geometry XI, Volume 4794, pp.  101–110. International Society for Optics and Photonics.
  67. Heart rate variability as a biomarker for predicting stroke, post-stroke complications and functionality. Biomarker Insights 13.
  68. The Theory of the Interleaving Distance on Multidimensional Persistence Modules. Foundations of Computational Mathematics 15(3), 613–650.
  69. Interactive Visualization of 2-D Persistence Modules. arXiv:1512.00180 [cs, math] version: 1.
  70. Computing Minimal Presentations and Bigraded Betti Numbers of 2-Parameter Persistent Homology. SIAM Journal on Applied Algebra and Geometry 6(2), 267–298. Publisher: Society for Industrial and Applied Mathematics.
  71. On the concept of depth for functional data. Journal of the American Statistical Association 104(486), 718–734.
  72. K-means clustering on the space of persistence diagrams. In Wavelets and Sparsity XVII, Volume 10394, pp.  218–227. SPIE.
  73. Probability measures on the space of persistence diagrams. Inverse Problems 27(12), 124007.
  74. Data structures for real multiparameter persistence modules. arXiv:1709.08155 [math].
  75. The database of normal rr-intervals of length up to 512 of 46 healthy subjects at rest.
  76. Generalized Persistence Diagrams. Journal of Applied and Computational Topology 1(3-4), 397–419.
  77. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector Learning 208.
  78. Ramsay, J. O. J. O. (2002). Applied functional data analysis : methods and case studies. New York, New York: Springer.
  79. Ramsay, J. O. J. O. (2005). Functional data analysis (2nd ed. ed.). Springer series in statistics. New York: Springer.
  80. A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4741–4748.
  81. Principal Component Analysis of Persistent Homology Rank Functions with case studies of Spatial Point Patterns, Sphere Packing and Colloids. Physica D: Nonlinear Phenomena 334, 99–117.
  82. Hypothesis testing for topological data analysis.
  83. Representation of functional data in neural networks. Neurocomputing 64, 183–210.
  84. Machine learning for post-traumatic stress disorder identification utilizing resting-state functional magnetic resonance imaging. Microscopy Research and Technique 85.
  85. Wasserstein Stability for Persistence Diagrams. arXiv:2006.16824 [math].
  86. Szpilrajn, E. (1930). Sur l’extension de l’ordre partiel. Fundamenta Mathematicae 16, 386–389. Publisher: Instytut Matematyczny Polskiej Akademii Nauk.
  87. The RIVET Developers (2020). Rivet.
  88. Fréchet Means for Distributions of Persistence Diagrams. Discrete & Computational Geometry 52(1), 44–70.
  89. Topological data analysis of thoracic radiographic images shows improved radiomics-based lung tumor histology prediction. Patterns 4(1), 100657.
  90. On the use of size functions for shape analysis. Biological Cybernetics 70(2), 99–107.
  91. Vipond, O. (2020). Multiparameter persistence landscapes. Journal of Machine Learning Research 21(61), 1–38.
  92. Research on the classification of brain function based on svm. In 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, pp.  1931–1934.
  93. Computing Persistent Homology. Discrete & Computational Geometry 33(2), 249–274.
Citations (1)

Summary

We haven't generated a summary for this paper yet.