Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Vector AutoRegression with Factorised Granger-Causal Graphs (2402.03614v2)

Published 6 Feb 2024 in cs.LG and stat.ML

Abstract: We study the problem of automatically discovering Granger causal relations from observational multivariate time-series data.Vector autoregressive (VAR) models have been time-tested for this problem, including Bayesian variants and more recent developments using deep neural networks. Most existing VAR methods for Granger causality use sparsity-inducing penalties/priors or post-hoc thresholds to interpret their coefficients as Granger causal graphs. Instead, we propose a new Bayesian VAR model with a hierarchical factorised prior distribution over binary Granger causal graphs, separately from the VAR coefficients. We develop an efficient algorithm to infer the posterior over binary Granger causal graphs. Comprehensive experiments on synthetic, semi-synthetic, and climate data show that our method is more uncertainty aware, has less hyperparameters, and achieves better performance than competing approaches, especially in low-data regimes where there are less observations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Sparse graphical vector autoregression: A bayesian approach. Annals of Economics and Statistics/Annales d’Économie et de Statistique, (123/124):333–361, 2016.
  2. Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 31, 2018.
  3. Survey and evaluation of causal discovery methods for time series. Journal of Artificial Intelligence Research, 73:767–819, 2022.
  4. Lotka, volterra and the predator–prey system (1920–1926). A short history of mathematical population dynamics, pp.  71–76, 2011.
  5. Gflownet foundations. Journal of Machine Learning Research, 24(210):1–55, 2023.
  6. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
  7. Bayesian nonparametric sparse var models. Journal of Econometrics, 212(1):97–115, 2019.
  8. Temporal aggregation and spurious instantaneous causality in multiple time series models. Journal of Time Series Analysis, 23(6):651–665, 2002.
  9. Neural additive vector autoregression models for causal discovery in time series. In International Conference on Discovery Science, pp.  446–460. Springer, 2021.
  10. Canny, J. Gap: a factor model for discrete data. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp.  122–129, 2004.
  11. Explaining the gibbs sampler. The American Statistician, 46(3):167–174, 1992.
  12. A model for reasoning about persistence and causation. Computational intelligence, 5(2):142–150, 1989.
  13. Searching for the causal structure of a vector autoregression. Oxford Bulletin of Economics and statistics, 65:745–767, 2003.
  14. Interpretable multi-scale neural network for granger causality discovery. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  1–5. IEEE, 2023.
  15. Ferguson, T. S. A Bayesian analysis of some nonparametric problems. The annals of statistics, pp.  209–230, 1973.
  16. Bayesian nonparametric inference of switching dynamic linear models. IEEE Transactions on signal processing, 59(4):1569–1585, 2011.
  17. Bayesian stochastic search for var model restrictions. Journal of Econometrics, 142(1):553–580, 2008.
  18. A tutorial on bayesian nonparametric models. Journal of Mathematical Psychology, 56(1):1–12, 2012.
  19. High-dimensional posterior consistency in bayesian vector autoregressive models. Journal of the American Statistical Association, 2018.
  20. Giles, M. B. Algorithm 955: approximation of the inverse poisson cumulative distribution function. ACM transactions on mathematical software (TOMS), 42(1):1–22, 2016.
  21. Causal discovery from temporal data. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  5803–5804, 2023.
  22. Rhino: Deep causal temporal relationship learning with history-dependent noise. arXiv preprint arXiv:2210.14706, 2022.
  23. Content-based recommendations with poisson factorization. Advances in neural information processing systems, 27, 2014.
  24. Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, pp.  424–438, 1969.
  25. Causal discovery from heterogeneous/nonstationary data. J. Mach. Learn. Res., 21(89):1–53, 2020.
  26. Estimation of a structural vector autoregression model using non-gaussianity. Journal of Machine Learning Research, 11(5), 2010.
  27. Extensive chaos in the lorenz-96 model. Chaos: An interdisciplinary journal of nonlinear science, 20(4), 2010.
  28. Economy statistical recurrent units for inferring nonlinear granger causality. In International Conference on Learning Representations, 2020.
  29. Litterman, R. B. Forecasting with bayesian vector autoregressions—five years of experience. Journal of Business & Economic Statistics, 4(1):25–38, 1986.
  30. Lorenz, E. N. Predictability: A problem partly solved. In Proc. Seminar on predictability, volume 1. Reading, 1996.
  31. Amortized causal discovery: Learning to infer causal graphs from time-series data. In Conference on Causal Learning and Reasoning, pp.  509–525. PMLR, 2022.
  32. Grouped graphical granger modeling for gene expression regulatory networks discovery. Bioinformatics, 25(12):i110–i118, 2009.
  33. Lütkepohl, H. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.
  34. Interpretable models for granger causality using self-explaining neural networks. In International Conference on Learning Representations, 2021.
  35. Dynamic bayesian networks: A state of the art. University of Twente Document Repository, 2001.
  36. Bayesian vector autoregressions: Estimation. In Oxford Research Encyclopedia of Economics and Finance. 2019.
  37. Neural networks with non-uniform embedding and explicit validation phase to assess granger causality. Neural networks, 71:159–171, 2015.
  38. Murphy, K. P. Dynamic bayesian networks: representation, inference and learning. University of California, Berkeley, 2002.
  39. Bayesian analysis of latent threshold dynamic models. Journal of Business & Economic Statistics, 31(2):151–164, 2013.
  40. Causal discovery with attention-based convolutional neural networks. Machine Learning and Knowledge Extraction, 1(1):19, 2019.
  41. Varx-l: Structured regularization for large vector autoregressions with exogenous variables. International Journal of Forecasting, 33(3):627–651, 2017.
  42. The statistical recurrent unit. In International Conference on Machine Learning, pp.  2671–2680. PMLR, 2017.
  43. Bayesian nonparametric models. Encyclopedia of machine learning, 1:81–89, 2010.
  44. Dynotears: Structure learning from time-series data. In International Conference on Artificial Intelligence and Statistics, pp.  1595–1605. PMLR, 2020.
  45. Causal inference on time series using restricted structural equation models. Advances in Neural Information Processing Systems, 26, 2013.
  46. Runge, J. Causal network reconstruction from time series: From theoretical assumptions to practical estimation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(7):075310, 2018.
  47. Runge, J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In Conference on Uncertainty in Artificial Intelligence, pp.  1388–1397. PMLR, 2020.
  48. Detecting and quantifying causal associations in large nonlinear time series datasets. Science advances, 5(11):eaau4996, 2019.
  49. Econometric and statistical modeling with python. In Proceedings of the 9th Python in science conference, pp.  57–61, 2010.
  50. Dynamic bayesian network modeling, learning, and inference: a survey. IEEE Access, 9:117639–117648, 2021.
  51. Granger causality: A review and recent advances. Annual Review of Statistics and Its Application, 9:289–319, 2022.
  52. Network modelling methods for fmri. Neuroimage, 54(2):875–891, 2011.
  53. Causation, prediction, and search. MIT press, 2000.
  54. Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions. Journal of the American Statistical Association, 92(437):357–367, 1997.
  55. Neural granger causality for nonlinear time series. stat, 1050:16, 2018.
  56. Bayesian dynamic causal discovery. In A causal view on dynamical systems, NeurIPS 2022 workshop, 2022.
  57. Estimating brain connectivity with varying-length time lags using a recurrent neural network. IEEE Transactions on Biomedical Engineering, 65(9):1953–1963, 2018.
  58. Stochastic expansions using continuous dictionaries: Lévy adaptive regression kernels. The Annals of Statistics, 39(4):1916–1962, 2011.
  59. Woźniak, T. Bayesian vector autoregressions. Australian Economic Review, 49(3):365–380, 2016.
  60. Discovering nonlinear relations with minimum predictive information regularization. arXiv preprint arXiv:2001.01885, 2020.
  61. Community-affiliation graph model for overlapping network community detection. In 2012 IEEE 12th international conference on data mining, pp.  1170–1175. IEEE, 2012.
  62. Structure and overlaps of ground-truth communities in networks. ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–35, 2014.
  63. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006.
  64. Zhou, M. Infinite edge partition models for overlapping community detection and link prediction. In AISTATS, pp.  1135–1143, 2015.
  65. Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):307–320, 2013.
  66. Beta-negative binomial process and poisson factor analysis. In Artificial Intelligence and Statistics, pp.  1462–1471. PMLR, 2012.
Citations (1)

Summary

  • The paper introduces a Bayesian VAR model that decouples causal relations from VAR coefficients for direct sampling of potential causal graphs.
  • It employs a hierarchical sparsity-inducing mechanism to robustly quantify uncertainty in Granger causal inference on sparse time series data.
  • The efficient Gibbs sampling inference yields stable convergence and superior performance compared to traditional and deep learning-based VAR models.

Introduction

The interpretation and understanding of causal relationships in multivariate time-series (MTS) data are crucial across a range of fields from neuroscience to economics. Fundamentally, Granger causality provides a valuable framework for discovering causal relations, quantifying the ability to predict the future value of one series using past values of another. Traditional Vector autoregressive (VAR) models and their Bayesian counterparts are commonly adopted for such analysis, with recent advancements integrating deep learning techniques. However, these methods often require large amounts of data, are sensitive to hyperparameter settings, and may lack in providing uncertainty quantification for decision-making tasks - a gap this research aims to bridge.

Bayesian VAR for Granger Causality

The proposed Bayesian VAR model with a hierarchical graph prior introduces a new approach to discovering binary Granger causal relations by distinctly modelling the underlying causal structure of MTS data. By decoupling binary causal relations from the VAR coefficients, this Bayesian framework allows direct sampling of potential causal graphs, providing a principled measure of uncertainty in causal inference.

Unlike existing models that incorporate sparsity through penalized coefficients or after-the-fact thresholding, our model employs a sparsity-inducing mechanism intrinsic to its Bayesian hierarchical structure. Through this mechanism, our method adapts to the sparsity level of the given MTS dataset efficiently.

Inferential Strengths and Performance

The hierarchical graph prior introduced in this model offers several advantages:

  • It promotes robust uncertainty quantification by modelling the posterior over Granger causal graphs.
  • It contains a flexible sparsity control mechanism suitable for sparse datasets.
  • It reduces reliance on hyperparameter tuning, which is particularly beneficial where no ground-truth graph is available for validation.

Comprehensively tested on benchmark datasets, the proposed method achieves remarkable performance against both contemporary Bayesian VAR methods and more recent deep learning-based VAR models. This performance advantage is especially pronounced with sparse time-series data.

Efficient Inference Algorithm

Crucially, the efficiency of the algorithm stems from the close-form posteriors available for all variables in the model, allowing effective Gibbs sampling-based inference. This aspect makes the method particularly practical for applied purposes, as it offers stable convergence with quicker computations.

Conclusion

The paper presents a Bayesian VAR method that adeptly addresses existing challenges in Granger causal discovery from MTS data. By providing reliable uncertainty quantification and requiring fewer hyperparameters, it sets a benchmark for sparse MTS datasets. While it assumes linear dynamics and may not immediately apply to more intricate datasets, the Poisson Factorised Granger-Causal Graph (PFGCG) model stands as a potent tool in the domain of causal analysis. The implications of this research are profound, given the foundational nature of causal reasoning in scientific enquiry and the growing interest in robust, interpretable methods in machine learning.

The introduction of PFGCG model in the causal inference landscape marks a progressive shift towards techniques that accommodate data sparsity and provide clear uncertainty measures, addressing key concerns in critical decision-making scenarios.

X Twitter Logo Streamline Icon: https://streamlinehq.com