Optimal Transport for Structure Learning Under Missing Data (2402.15255v2)
Abstract: Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among variables. Merely filling in missing values with existing imputation methods and subsequently applying structure learning on the complete data is empirically shown to be sub-optimal. To address this problem, we propose a score-based algorithm for learning causal structures from missing data based on optimal transport. This optimal transport viewpoint diverges from existing score-based approaches that are dominantly based on expectation maximization. We formulate structure learning as a density fitting problem, where the goal is to find the causal model that induces a distribution of minimum Wasserstein distance with the observed data distribution. Our framework is shown to recover the true causal graphs more effectively than competing methods in most simulations and real-data settings. Empirical evidence also shows the superior scalability of our approach, along with the flexibility to incorporate any off-the-shelf causal discovery methods for complete data.
- Learning bayesian networks with incomplete data by augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- Learning causal graphs via monotone triangular transport maps. arXiv preprint arXiv:2305.18210, 2023.
- Amos, B. et al. Tutorial on amortized optimization. Foundations and Trends® in Machine Learning, 16(5):592–732, 2023.
- Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223. PMLR, 2017.
- Causal reasoning in the presence of latent confounders via neural admg learning. In The Eleventh International Conference on Learning Representations, 2022.
- On minimum kantorovich distance estimators. Statistics & probability letters, 76(12):1298–1302, 2006.
- Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization. Advances in Neural Information Processing Systems, 35:8226–8239, 2022.
- On parameter estimation with the wasserstein distance. Information and Inference: A Journal of the IMA, 8(4):657–676, 2019.
- A unified wasserstein distributional robustness framework for adversarial training. In International Conference on Learning Representations, 2021.
- Learning probability measures with respect to optimal transport metrics. Advances in Neural Information Processing Systems, 25, 2012.
- Differentiable dag sampling. arXiv preprint arXiv:2203.08509, 2022.
- Chickering, D. M. Learning bayesian networks is np-complete. Learning from data: Artificial intelligence and statistics V, pp. 121–130, 1996.
- Chickering, D. M. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
- Large-sample learning of bayesian networks is np-hard. Journal of Machine Learning Research, 5:1287–1330, 2004.
- Polyhedral aspects of score equivalence in bayesian network structure learning. Mathematical Programming, 164:285–324, 2017.
- Multiple imputation via generative adversarial network for high-dimensional blockwise missing value problems. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 791–798. IEEE, 2021.
- Maximum Likelihood from Incomplete Data Via the EM Algorithm . Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977. doi: 10.1111/j.2517-6161.1977.tb01600.x.
- Optimizing notears objectives via topological swaps. In International Conference on Machine Learning, pp. 7563–7595. PMLR, 2023.
- Fragmgan: generative adversarial nets for fragmentary data imputation and prediction. Statistical Theory and Related Fields, pp. 1–14, 2023.
- Friedman, N. et al. Learning belief networks in the presence of missing values and hidden variables. In Icml, volume 97, pp. 125–133. Berkeley, CA, 1997.
- Structure learning under missing data. In International conference on probabilistic graphical models, pp. 121–132. PMLR, 2018.
- Missdag: Causal discovery in the presence of missing data with continuous additive noise models. Advances in Neural Information Processing Systems, 35:5024–5038, 2022.
- Handling missing data via max-entropy regularized graph autoencoder. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 7651–7659, 2023.
- Deep end-to-end causal inference. arXiv preprint arXiv:2202.02195, 2022.
- Review of causal discovery methods based on graphical models. Frontiers in genetics, 10:524, 2019.
- Dream4: Combining genetic and dynamic information to identify biological networks and dynamical models. PloS one, 5(10):e13397, 2010.
- A bayesian approach to causal discovery. Innovations in Machine Learning: Theory and Applications, pp. 1–28, 2006.
- Nonlinear causal discovery with additive noise models. Advances in neural information processing systems, 21, 2008.
- Hyperimpute: Generalized iterative imputation with automatic model selection. In International Conference on Machine Learning, pp. 9916–9937. PMLR, 2022.
- Kantorovich, L. V. On a problem of monge. Journal of Mathematical Sciences, 133(4):1383–1383, 2006.
- Miracle: Causally-aware imputation via learning missing data mechanisms. Advances in Neural Information Processing Systems, 34:23806–23817, 2021.
- Gradient-based neural dag learning. In International Conference on Learning Representations, 2019.
- Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019.
- On the stationary distribution of iterative imputations. Biometrika, 101(1):155–173, 2014.
- Dibs: Differentiable bayesian structure learning. Advances in Neural Information Processing Systems, 34:24111–24123, 2021.
- Amortized inference for causal structure learning. Advances in Neural Information Processing Systems, 35:13104–13118, 2022.
- Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the national academy of sciences, 107(14):6286–6291, 2010.
- Miwae: Deep generative modelling and imputation of incomplete data sets. In International conference on machine learning, pp. 4413–4423. PMLR, 2019.
- Graphical models for processing missing data. Journal of the American Statistical Association, 116(534):1023–1037, 2021.
- Simultaneous missing value imputation and structure learning with groups. Advances in Neural Information Processing Systems, 35:20011–20024, 2022.
- Missing data imputation using optimal transport. In International Conference on Machine Learning, pp. 7130–7140. PMLR, 2020.
- Handling incomplete heterogeneous data using vaes. Pattern Recognition, 107:107501, 2020.
- On the role of sparsity and dag constraints for learning linear dags. Advances in Neural Information Processing Systems, 33:17943–17954, 2020.
- Masked gradient-based causal structure learning. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pp. 424–432. SIAM, 2022.
- Cycle class consistency with distributional optimal transport and knowledge distillation for unsupervised domain adaptation. In Uncertainty in Artificial Intelligence, pp. 1519–1529. PMLR, 2022.
- Finding optimal gene networks using biological constraints. Genome Informatics, 14:124–133, 2003.
- Dynotears: Structure learning from time-series data. In International Conference on Artificial Intelligence and Statistics, pp. 1595–1605. PMLR, 2020.
- Pearl, J. Causality. Cambridge university press, 2009.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Missing data imputation and acquisition with deep hierarchical models and hamiltonian monte carlo. Advances in Neural Information Processing Systems, 35:35839–35851, 2022.
- Causal discovery with continuous additive noise models. 2014.
- Computational optimal transport. Center for Research in Economics and Statistics Working Papers, (2017-86), 2017.
- Mcflow: Monte carlo flow models for data imputation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14205–14214, 2020.
- Improving the accuracy of medical diagnosis with causal machine learning. Nature communications, 11(1):3923, 2020.
- Nonparametric sparsity and regularization. 2013.
- Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005.
- Santambrogio, F. Optimal transport for applied mathematicians. Birkäuser, NY, 55(58-63):94, 2015.
- Singh, M. Learning bayesian networks from incomplete data. AAAI/IAAI, 1001:534–539, 1997.
- An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62–72, 1991.
- Causation, prediction, and search. MIT press, 2000.
- Fast causal inference with non-random missingness by test-wise deletion. International journal of data science and analytics, 6:47–62, 2018.
- Ordering-based search: A simple and effective algorithm for learning bayesian networks. arXiv preprint arXiv:1207.1429, 2012.
- Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558, 2017.
- Causal discovery in the presence of missing data. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1762–1770. PMLR, 2019.
- Optimal transport for causal discovery. In International Conference on Learning Representations, 2021.
- mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45:1–67, 2011.
- Multivariate imputation by chained equations, 2000.
- Fully conditional specification in multivariate imputation. Journal of statistical computation and simulation, 76(12):1049–1064, 2006.
- Villani, C. et al. Optimal transport: old and new, volume 338. Springer, 2009.
- Learning directed graphical models with optimal transport. arXiv preprint arXiv:2305.15927, 2023.
- Vector quantized wasserstein auto-encoder. arXiv preprint arXiv:2302.05917, 2023.
- Ordering-based causal discovery with reinforcement learning. arXiv preprint arXiv:2105.06631, 2021.
- Causal inference for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems, pp. 426–431, 2020.
- Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning, pp. 5689–5698. PMLR, 2018.
- Gamin: Generative adversarial multiple imputation network for highly missing data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8456–8464, 2020.
- Handling missing data with graph representation learning. Advances in Neural Information Processing Systems, 33:19075–19087, 2020.
- Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154–7163. PMLR, 2019.
- Learning optimal bayesian networks using a* search. In Twenty-second international joint conference on artificial intelligence, 2011.
- Integrated systems approach identifies genetic nodes and networks in late-onset alzheimer’s disease. Cell, 153(3):707–720, 2013.
- Deepemd: Differentiable earth mover’s distance for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5632–5648, 2022.
- Neural topic model via optimal transport. In International Conference on Learning Representations, 2020.
- Transformed distribution matching for missing value imputation. In International Conference on Machine Learning, pp. 42159–42186. PMLR, 2023.
- Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
- Learning sparse nonparametric dags. In International Conference on Artificial Intelligence and Statistics, pp. 3414–3425. PMLR, 2020.
- Convergence properties of a sequential regression multiple imputation algorithm. Journal of the American Statistical Association, 110(511):1112–1124, 2015.
- Causal discovery with reinforcement learning. arXiv preprint arXiv:1906.04477, 2019.
- Vy Vo (12 papers)
- He Zhao (117 papers)
- Trung Le (94 papers)
- Edwin V. Bonilla (33 papers)
- Dinh Phung (147 papers)