Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation (2302.07865v2)
Abstract: Distribution shift is a major source of failure for machine learning models. However, evaluating model reliability under distribution shift can be challenging, especially since it may be difficult to acquire counterfactual examples that exhibit a specified shift. In this work, we introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances from that input distribution that exhibit the desired shift. We study a number of natural implementations for such an interface, and find that they often introduce confounding shifts that complicate model evaluation. Motivated by this, we propose a dataset interface implementation that leverages Textual Inversion to tailor generation to the input distribution. We then demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts, including variations in background, lighting, and attributes of the objects. Code available at https://github.com/MadryLab/dataset-interfaces.
- “Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects” In Conference on Computer Vision and Pattern Recognition (CVPR), 2019
- Romain Beaumont “Clip Retrieval: Easily compute clip embeddings and build a clip retrieval system with them” In GitHub repository GitHub, https://github.com/rom1504/clip-retrieval, 2022
- “ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models” In Neural Information Processing Systems (NeurIPS), 2019
- Sara Beery, Grant Van Horn and Pietro Perona “Recognition in terra incognita” In European Conference on Computer Vision (ECCV), 2018
- “Functional Map of the World” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
- “Identifying Statistical Bias in Dataset Replication” In International Conference on Machine Learning (ICML), 2020
- “Exploring the Landscape of Spatial Robustness” In International Conference on Machine Learning (ICML), 2019
- “An image is worth one word: Personalizing text-to-image generation using textual inversion” In arXiv preprint arXiv:2208.01618, 2022
- “Adaptive Testing of Computer Vision Models” In arXiv preprint arXiv:2212.02774, 2022
- “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization”, 2020 arXiv:2006.16241 [cs.CV]
- Dan Hendrycks and Thomas G. Dietterich “Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations” In International Conference on Learning Representations (ICLR), 2019
- “Towards Analyzing Semantic Robustness of Deep Neural Networks” In arXiv preprint arXiv:1904.04621, 2019
- Abdullah Hamdi, Matthias Muller and Bernard Ghanem “SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications” In arXiv preprint arXiv:1812.02132, 2018
- “Analyzing and Improving Neural Networks by Generating Semantic Counterexamples through Differentiable Rendering” In arXiv preprint arXiv:1910.00727, 2020
- “Distilling Model Failures as Directions in Latent Space” In arXiv preprint arXiv:2206.14754, 2022
- Guillaume Jeanneret, Loïc Simon and Frédéric Jurie “Diffusion Models for Counterfactual Explanations” In Proceedings of the Asian Conference on Computer Vision, 2022, pp. 858–876
- Priyatham Kattakinda, Alexander Levine and Soheil Feizi “Invariant Learning via Diffusion Dreamed Distribution Shifts” In arXiv preprint arXiv:2211.10370, 2022
- Aengus Lynch, Jean Kaddour and Ricardo Silva “Evaluating the Impact of Geometric and Statistical Skews on Out-Of-Distribution Generalization Performance” In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022
- “3DB: A Framework for Debugging Computer Vision Models” In arXiv preprint arXiv:2106.03805, 2021
- “Fine-grained visual classification of aircraft” In arXiv preprint arXiv:1306.5151, 2013
- “Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization” In International Conference on Machine Learning, 2021, pp. 7721–7735 PMLR
- “Cats and dogs” In 2012 IEEE conference on computer vision and pattern recognition, 2012, pp. 3498–3505 IEEE
- “High-resolution image synthesis with latent diffusion models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695
- “Hierarchical text-conditional image generation with clip latents” In arXiv preprint arXiv:2204.06125, 2022
- “ImageNet Large Scale Visual Recognition Challenge” In International Journal of Computer Vision (IJCV), 2015
- “Learning transferable visual models from natural language supervision” In arXiv preprint arXiv:2103.00020, 2021
- “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation” In arXiv preprint arXiv:2208.12242, 2022
- “Do ImageNet Classifiers Generalize to ImageNet?” In International Conference on Machine Learning (ICML), 2019
- “Laion-5b: An open large-scale dataset for training next generation image-text models” In arXiv preprint arXiv:2210.08402, 2022
- “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding” In arXiv preprint arXiv:2205.11487, 2022
- “Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization” In International Conference on Learning Representations, 2020
- “Identifying Model Weakness with Adversarial Examiner” In AAAI Conference on Artificial Intelligence (AAAI), 2020
- “Measuring Robustness to Natural Distribution Shifts in Image Classification” In Neural Information Processing Systems (NeurIPS), 2020
- Olivia Wiles, Isabela Albuquerque and Sven Gowal “Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning” In arXiv preprint arXiv:2208.08831, 2022
- “Learning robust global representations by penalizing local predictive power” In Neural Information Processing Systems (NeurIPS), 2019
- Ross Wightman “PyTorch Image Models” In GitHub repository GitHub, https://github.com/rwightman/pytorch-image-models, 2019 DOI: 10.5281/zenodo.4414861
- “Noise or signal: The role of image backgrounds in object recognition” In arXiv preprint arXiv:2006.09994, 2020
- “Not Just Pretty Pictures: Text-to-Image Generators Enable Interpretable Interventions for Robust Representations” In arXiv preprint arXiv:2212.11237, 2022
- Joshua Vendrow (13 papers)
- Saachi Jain (14 papers)
- Logan Engstrom (27 papers)
- Aleksander Madry (86 papers)