Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Embracing the black box: Heading towards foundation models for causal discovery from time series data (2402.09305v1)

Published 14 Feb 2024 in cs.LG and cs.AI

Abstract: Causal discovery from time series data encompasses many existing solutions, including those based on deep learning techniques. However, these methods typically do not endorse one of the most prevalent paradigms in deep learning: End-to-end learning. To address this gap, we explore what we call Causal Pretraining. A methodology that aims to learn a direct mapping from multivariate time series to the underlying causal graphs in a supervised manner. Our empirical findings suggest that causal discovery in a supervised manner is possible, assuming that the training and test time series samples share most of their dynamics. More importantly, we found evidence that the performance of Causal Pretraining can increase with data and model size, even if the additional data do not share the same dynamics. Further, we provide examples where causal discovery for real-world data with causally pretrained neural networks is possible within limits. We argue that this hints at the possibility of a foundation model for causal discovery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Causal Discovery using Model Invariance through Knockoff Interventions. In ICML 2022: Workshop on Spurious Correlations, Invariance and Stability.
  2. Akaike, H. 1992. Information Theory and an Extension of the Maximum Likelihood Principle. 610–624. New York, NY: Springer New York. ISBN 9780387940373 9781461209195.
  3. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin. In Proceedings of The 33rd International Conference on Machine Learning, 173–182. PMLR.
  4. Survey and Evaluation of Causal Discovery Methods for Time Series. Journal of Artificial Intelligence Research, 73: 767–819.
  5. Benign Overfitting in Linear Regression. Proceedings of the National Academy of Sciences, 117(48): 30063–30070. ArXiv:1906.11300 [cs, math, stat].
  6. End to End Learning for Self-Driving Cars. ADS Bibcode: 2016arXiv160407316B.
  7. Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem. In Proceedings of the 36th International Conference on Machine Learning, 822–830. PMLR.
  8. A Universal Law of Robustness via Isoperimetry. In Advances in Neural Information Processing Systems, volume 34, 28811–28822. Curran Associates, Inc.
  9. seq2graph: Discovering Dynamic Dependencies from Multivariate Time Series with Multi-level Attention. ArXiv:1812.04448 [cs, stat].
  10. Deep End-to-end Causal Inference.
  11. High-recall causal discovery for autocorrelated time series with latent confounders. In Advances in Neural Information Processing Systems, volume 33, 12615–12625. Curran Associates, Inc.
  12. Glasmachers, T. 2017. Limits of End-to-End Learning. In Proceedings of the Ninth Asian Conference on Machine Learning, 17–32. PMLR.
  13. Causal Discovery from Temporal Data: An Overview and New Perspectives. ArXiv:2303.10112 [cs, stat].
  14. Granger, C. W. J. 1969. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica, 37(3): 424–438.
  15. Exploring interpretable LSTM neural networks over multi-variable data. In Proceedings of the 36th International Conference on Machine Learning, 2494–2504. PMLR.
  16. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5): 359–366.
  17. Neural Autoregressive Flows. In Proceedings of the 35th International Conference on Machine Learning, 2078–2087. PMLR.
  18. Using Non-Linear Causal Models to Study Aerosol-Cloud Interactions in the Southeast Pacific. ArXiv:2110.15084 [physics].
  19. Kuramoto, Y. 1975. Self-entrainment of a population of coupled non-linear oscillators. Mathematical Problems in Theoretical Physics, 39: 420–422. ADS Bibcode: 1975LNP….39..420K.
  20. CASTLE: Regularization via Auxiliary Causal Graph Discovery. In Advances in Neural Information Processing Systems, volume 33, 1501–1512. Curran Associates, Inc.
  21. Gradient-Based Neural DAG Learning. ArXiv:1906.02226 [cs, stat].
  22. Supervised Whole DAG Causal Discovery. ArXiv:2006.04697 [cs, stat].
  23. Causal Recurrent Variational Autoencoder for Medical Time Series Generation. ArXiv:2301.06574 [cs, eess].
  24. Towards a Learning Theory of Cause-Effect Inference. In Proceedings of the 32nd International Conference on Machine Learning, 1452–1461. PMLR.
  25. Decoupled Weight Decay Regularization.
  26. Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data. In Proceedings of the First Conference on Causal Learning and Reasoning, 509–525. PMLR.
  27. Meng, Y. 2019. Estimating Granger Causality with Unobserved Confounders via Deep Latent-Variable Recurrent Neural Network. ArXiv:1909.03704 [cs, stat].
  28. Neural Networks with Non-Uniform Embedding and Explicit Validation Phase to Assess Granger Causality. Neural Networks, 71: 159–171.
  29. A Graph Autoencoder Approach to Causal Structure Learning. ADS Bibcode: 2019arXiv191107420N.
  30. OpenAI. 2023. GPT-4 Technical Report. ArXiv:2303.08774 [cs].
  31. DYNOTEARS: Structure Learning from Time-Series Data. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 1595–1605. PMLR.
  32. Causal Discovery for Observational Sciences Using Supervised Machine Learning. Journal of Data Science, 21(2): 255–280.
  33. Causal inference for time series. Nature Reviews Earth & Environment, 4(7): 487–505.
  34. Detecting causal associations in large nonlinear time series datasets. Science Advances, 5(11): eaau4996. ArXiv:1702.07007 [physics, stat].
  35. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure. The Annals of Statistics, 48(3). ArXiv:1804.07203 [math, stat].
  36. Causation, Prediction, and Search. MIT Press. ISBN 9780262194402.
  37. NTS-NOTEARS: Learning Nonparametric DBNs With Prior Knowledge. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, 1942–1964. PMLR.
  38. An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery. ArXiv:1711.08160 [stat].
  39. Neural Granger Causality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. ArXiv:1802.05842 [stat].
  40. Multilayer Perceptron (MLP). In Camacho Olmedo, M. T.; Paegelow, M.; Mas, J.-F.; and Escobar, F., eds., Geomatic Approaches for Modeling Land Change Scenarios, Lecture Notes in Geoinformation and Cartography, 451–455. Cham: Springer International Publishing. ISBN 9783319608013.
  41. Time Series Causal Link Estimation under Hidden Confounding using Knockoff Interventions. ADS Bibcode: 2022arXiv220911497T.
  42. Patches Are All You Need? ADS Bibcode: 2022arXiv220109792T.
  43. D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery. ACM Computing Surveys, 55(4): 82:1–82:36.
  44. Estimating Brain Connectivity With Varying-Length Time Lags Using a Recurrent Neural Network. IEEE Transactions on Biomedical Engineering, 65(9): 1953–1963.
  45. DAGs with NO TEARS: Smooth Optimization for Structure Learning.
  46. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12): 11106–11115. Number: 12.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Gideon Stein (6 papers)
  2. Maha Shadaydeh (15 papers)
  3. Joachim Denzler (87 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets