Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spurious Correlations in Machine Learning: A Survey (2402.12715v2)

Published 20 Feb 2024 in cs.LG

Abstract: Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to aid future research. The paper concludes with a discussion of the recent advancements and future challenges in this field, aiming to provide valuable insights for researchers in the related domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Towards causal vqa: Revealing and reducing spurious correlations by invariant and covariant semantic editing. In CVPR, 2020.
  2. Invariant Risk Minimization. arXiv, 2019.
  3. Masktune: Mitigating spurious correlations by forcing to explore. In NeurIPS, 2022.
  4. Learning de-biased representations with biased representations. In ICML, 2020.
  5. Nuanced metrics for measuring unintended bias with real data for text classification. In WWW, 2019.
  6. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis, 2020.
  7. Why does throwing away data improve worst-group error? In ICML, 2023.
  8. When does group invariant learning survive spurious correlations? In NeurIPS, 2022.
  9. Functional map of the world. In CVPR, 2018.
  10. Environment inference for invariant learning. In ICLR, 2021.
  11. Seeing is not believing: Robust reinforcement learning against spurious correlation. In NeurIPS, 2023.
  12. Shortcut learning of large language models in natural language understanding: A survey. arXiv, 2022.
  13. Towards debiasing dnn models from spurious feature influence. In AAAI, 2022.
  14. Less learn shortcut: Analyzing and mitigating learning of spurious feature-label correlation. In IJCAI, 2023.
  15. Spuriosity didn’t kill the classifier: Using invariant predictions to harness spurious features. In NeurIPS, 2023.
  16. Large scale crowdsourcing and characterization of twitter abusive behavior. In ICWSM, 2018.
  17. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020.
  18. Influence tuning: Demoting spurious correlations via instance attribution and instance-driven updates. In EMNLP 2021, 2021.
  19. Towards non-iid image classification: A dataset and baselines. Pattern Recognition, 2021.
  20. Mitigating backdoor poisoning attacks through the lens of spurious correlation. In EMNLP, 2023.
  21. Benchmarking neural network robustness to common corruptions and perturbations. In ICLR, 2018.
  22. Natural adversarial examples. In CVPR, 2021.
  23. Improving multi-task generalization via regularizing spurious correlation. In NeurIPS, 2022.
  24. Developing medical imaging ai for emerging infectious diseases. Nature Communications, 2022.
  25. Simple data balancing achieves competitive worst-group-accuracy. In CLeaR, 2022.
  26. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In AAAI, 2019.
  27. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 2019.
  28. Learning debiased classifier with biased committee. In NeurIPS, 2022.
  29. Last layer re-training is sufficient for robustness to spurious correlations. In ICLR, 2023.
  30. Out-of-distribution generalization via risk extrapolation (rex). In ICML, 2021.
  31. Towards last-layer retraining for group robustness with fewer annotations. In NeurIPS, 2023.
  32. Learning debiased representation via disentangled feature augmentation. In NeurIPS, 2021.
  33. Diversify and disambiguate: Out-of-distribution robustness via disagreement. In ICLR, 2023.
  34. Large-scale methods for distributionally robust optimization. In NeurIPS, 2020.
  35. Metashift: A dataset of datasets for evaluating contextual distribution shifts and training conflicts. In ICLR, 2021.
  36. Just Train Twice: Improving Group Robustness without Training Group Information. In ICML, 2021.
  37. Avoiding spurious correlations via logit correction. In ICLR, 2023.
  38. Explanation-based finetuning makes models more robust to spurious cues. In ACL, 2023.
  39. Spawrious: A benchmark for fine control of spurious correlation biases. arXiv, 2023.
  40. Spuriosity rankings: Sorting data to measure and mitigate biases. In NeurIPS, 2023.
  41. Learning from failure: De-biasing classifier from biased classifier. NeurIPS, 2020.
  42. Spread spurious attribute: Improving worst-group accuracy with spurious attribute estimation. In ICLR, 2022.
  43. Nuisances via negativa: Adjusting for spurious correlations via data augmentation. arXiv, 2022.
  44. Distributionally robust neural networks. In ICLR, 2019.
  45. An investigation of why overparameterization exacerbates spurious correlations. In ICML, 2020.
  46. The Pitfalls of Simplicity Bias in Neural Networks. In NeurIPS, 2020.
  47. Robustness to spurious correlations via human annotations. In ICML, 2020.
  48. Recovering latent causal factor for generalization to distributional shifts. In NeurIPS, 2021.
  49. Beyond invariance: Test-time label-shift adaptation for addressing ”spurious” correlations. In NeurIPS, 2023.
  50. Long-tailed classification by keeping the good and removing the bad momentum causal effect. In NeurIPS, 2020.
  51. Measuring robustness to natural distribution shifts in image classification. NeurIPS, 2020.
  52. Overcoming Simplicity Bias in Deep Networks using a Feature Sieve. In ICML, 2023.
  53. Vladimir Vapnik. Principles of risk minimization for learning theory. NeurIPS, 1991.
  54. Counterfactual invariance to spurious correlations in text classification. In NeurIPS, 2021.
  55. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, 2017.
  56. Causal attention for unbiased visual recognition. In ICCV, 2021.
  57. Generalizing to Unseen Domains: A Survey on Domain Generalization. TKDE, 35(8):8052–8072, August 2023.
  58. Discover and cure: Concept-aware mitigation of spurious correlation. In ICML, 2023.
  59. Noise or signal: The role of image backgrounds in object recognition. In ICLR, 2020.
  60. Robust interpretable text classification against spurious correlations using and-rules with negation. In IJCAI, 2022.
  61. Chroma-vae: Mitigating shortcut learning with generative classifiers. In NeurIPS, 2022.
  62. Identifying spurious biases early in training through the lens of simplicity bias. arXiv, 2023.
  63. Mitigating spurious correlations in multi-modal models during fine-tuning. In ICML, 2023.
  64. Improving out-of-distribution robustness via selective augmentation. In ICML, 2022.
  65. Facts: First amplify correlations and then slice to discover bias. In ICCV, 2023.
  66. Counterfactual generator: A weakly-supervised method for named entity recognition. In EMNLP, 2020.
  67. Deep stable learning for out-of-distribution generalization. In CVPR, 2021.
  68. Correct-n-contrast: a contrastive approach for improving robustness to spurious correlations. In ICML, 2022.
Citations (16)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets