Valid causal inference with unobserved confounding in high-dimensional settings (2401.06564v1)
Abstract: Various methods have recently been proposed to estimate causal effects with confidence intervals that are uniformly valid over a set of data generating processes when high-dimensional nuisance models are estimated by post-model-selection or machine learning estimators. These methods typically require that all the confounders are observed to ensure identification of the effects. We contribute by showing how valid semiparametric inference can be obtained in the presence of unobserved confounders and high-dimensional nuisance models. We propose uncertainty intervals which allow for unobserved confounding, and show that the resulting inference is valid when the amount of unobserved confounding is small relative to the sample size; the latter is formalized in terms of convergence rates. Simulation experiments illustrate the finite sample properties of the proposed intervals and investigate an alternative procedure that improves the empirical coverage of the intervals when the amount of unobserved confounding is large. Finally, a case study on the effect of smoking during pregnancy on birth weight is used to illustrate the use of the methods introduced to perform a sensitivity analysis to unobserved confounding.
- The costs of low birth weight. The Quarterly Journal of Economics 120(3), 1031–1083.
- Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81(2), 608–650.
- Discrete multivariate analysis: theory and practice. Springer Science & Business Media.
- Bonvini, M. and E. H. Kennedy (2022). Sensitivity analysis via the proportion of unmeasured confounding. Journal of the American Statistical Association 117(539), 1540–1550.
- Cattaneo, M. D. (2010). Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics 155(2), 138–154.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21(1), C1–C68.
- hdm: High-dimensional metrics. arXiv preprint arXiv:1608.00354.
- Copas, J. B. and H. G. Li (1997). Inference for non-random samples. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59(1), 55–95.
- Smoking and Lung Cancer: Recent Evidence and a Discussion of Some Questions. JNCI: Journal of the National Cancer Institute 22(1), 173–203.
- Ding, P. and T. J. VanderWeele (2016). Sensitivity analysis without assumptions. Epidemiology (Cambridge, Mass.) 27(3), 368–377.
- Farrell, M. H. (2015). Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics 189(1), 1–23.
- Farrell, M. H. (2018). Robust inference on average treatment effects with possibly more covariates than observations. arXiv:1309.4686v3.
- Fisher, R. (1958). Cigarettes, cancer, and statistics. The Centennial Review of Arts & Science 2, 151–166.
- Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association 115(532), 1730–1746.
- Package ‘glmnet’. CRAN R Repositary.
- Nonparametric bounds for causal effects in imperfect randomized experiments. Journal of the American Statistical Association 118(541), 684–692.
- Causal inference accounting for unobserved confounding after outcome regression and doubly robust estimation. Biometrics 75(2), 506–515.
- Inference for partial correlation when data are missing not at random. Statistics & Probability Letters 141, 82–89.
- Demystifying statistical learning based on efficient influence functions. The American Statistician 76(3), 292–304.
- A general approach to causal mediation analysis. Psychological methods 15(4), 309–334.
- The Costs and Benefits of Uniformly Valid Causal Inference with High-Dimensional Nuisance Parameters. Statistical Science 38(1), 1 – 12.
- R Core Team (2019). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
- Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89(427), 846–866.
- Rosenbaum, P. R. (1987). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika 74(1), 13–26.
- Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66(5), 688–701.
- Rubin, D. B. (1990). Formal mode of statistical inference for causal effects. Journal of Statistical Planning and Inference 25(3), 279–292.
- Rejoinder to comments on “adjusting for non-ignorable drop-out using semiparametric non-response models?”. Journal of the American Statistical Association 94, 1121–1146.
- Semiparametric sensitivity analysis: Unmeasured confounding in observational studies. arXiv preprint arXiv:2104.08300.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288.
- Van der Laan, M. J. and S. Gruber (2010). Collaborative double robust targeted maximum likelihood estimation. The International Journal of Biostatistics 6(1), Article 17.
- Van der Laan, M. J. and S. Rose (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.
- Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica 16(3), 953–979.
- Zhang, B. and E. J. Tchetgen Tchetgen (2022). A semi‐parametric approach to model‐based sensitivity analysis in observational studies. Journal of the Royal Statistical Society Series A 185(S2), 668–691.
- Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society Series B 81(4), 735–761.
- Niloofar Moosavi (3 papers)
- Tetiana Gorbach (3 papers)
- Xavier de Luna (17 papers)