High-Dimensional Independence Testing via Maximum and Average Distance Correlations (2001.01095v2)
Abstract: This paper introduces and investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, assess the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.
- A fast algorithm for computing distance correlation. Computational Statistics & Data Analysis, 135:15–24.
- Learning fair representations via distance correlation minimization. IEEE Transactions on Neural Networks and Learning Systems, pages 1–14.
- Testing independence for multivariate time series via the auto-distance correlation matrix. Biometrika, 105(2):337–352.
- Kernel measures of conditional dependence. In Advances in neural information processing systems.
- Good, P. (2005). Permutation, Parametric, and Bootstrap Tests of Hypotheses. Springer.
- Consistent nonparametric tests of independence. Journal of Machine Learning Research, 11:1391–1423.
- Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129.
- A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2):503–510.
- A statistically and numerically efficient independence test based on random projections and distance covariance. arXiv.
- Fast computing for distance covariance. Technometrics, 58(4):435–447.
- Network dependence testing via diffusion maps and distance-based correlations. Biometrika, 106(4):857–873.
- Feature screening via distance correlation learning. Journal of American Statistical Association, 107:1129–1139.
- Lyons, R. (2013). Distance covariance in metric spaces. Annals of Probability, 41(5):3284–3305.
- Lyons, R. (2018). Errata to “distance covariance in metric spaces”. Annals of Probability, 46(4):2400–2405.
- Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242.
- On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In 29th AAAI Conference on Artificial Intelligence.
- DISCO analysis: A nonparametric extension of analysis of variance. Annals of Applied Statistics, 4(2):1034–1055.
- Equivalence of distance-based and rkhs-based statistics in hypothesis testing. Annals of Statistics, 41(5):2263–2291.
- Independence testing for temporal data. https://arxiv.org/abs/1908.06486.
- The chi-square test of distance correlation. Journal of Computational and Graphical Statistics, 31(1):254–262.
- From distance correlation to multiscale graph correlation. Journal of the American Statistical Association, 115(529):280–291.
- The exact equivalence of distance and kernel methods in hypothesis testing. AStA Advances in Statistical Analysis, 105(3):385–403.
- Discovering the signal subgraph: An iterative screening approach on graphs. https://arxiv.org/abs/1801.07683.
- Hierarchical clustering via joint between-within distances: Extending ward’s minimum variance method. Journal of Classification, 22:151–183.
- Brownian distance covariance. Annals of Applied Statistics, 3(4):1233–1303.
- The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, 117:193–213.
- Partial distance correlation with methods for dissimilarities. Annals of Statistics, 42(6):2382–2412.
- Measuring and testing independence by correlation of distances. Annals of Statistics, 35(6):2769–2794.
- Discovering and deciphering relationships across disparate data modalities. eLife, 8:e41690.
- Mutant proteins as cancer-specific biomarkers. Proceedings of the National Academy of Sciences of the United States of America, (6):2444–9.
- A selected reaction monitoring approach for validating candidate biomarkers. PNAS.
- Conditional Distance Correlation. Journal of the American Statistical Association, 110(512):1726–1734.
- Graph independence testing: Applications in multi-connectomics. https://arxiv.org/abs/1906.03661.
- Large-scale kernel methods for independence testing. Statistics and Computing, 28(1):113–130.
- On the versatile uses of partial distance correlation in deep learning. In European Conference on Computer Vision, pages 327–346.
- An iterative approach to distance correlation-based sure independence screening. Journal of Statistical Computation and Simulation, 85(11):2331–2345.
- Zhou, Z. (2012). Measuring nonlinear dependence in time‐series, a distance correlation approach. Journal of Time Series Analysis, 33(3):438–457.
- Distance-based and rkhs-based dependence metrics in high dimension. https://arxiv.org/abs/1902.03291.