Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 64 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 78 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Feature Protection For Out-of-distribution Generalization (2405.16027v1)

Published 25 May 2024 in cs.LG

Abstract: With the availability of large pre-trained models, a modern workflow for building real-world machine learning solutions is to fine-tune such models on a downstream task with a relatively small domain-specific dataset. In such applications, one major challenge is that the small fine-tuning dataset does not have sufficient coverage of the distribution encountered when the model is deployed. It is thus important to design fine-tuning methods that are robust to out-of-distribution (OOD) data that are under-represented by the training data. This paper compares common fine-tuning methods to investigate their OOD performance and demonstrates that standard methods will result in a significant change to the pre-trained model so that the fine-tuned features overfit the fine-tuning dataset. However, this causes deteriorated OOD performance. To overcome this issue, we show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization. We validate the feature protection methods with extensive experiments of fine-tuning CLIP on ImageNet and DomainNet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. The evolution of out-of-distribution robustness throughout fine-tuning. arXiv preprint arXiv:2106.15831, 2021.
  2. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
  3. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
  4. Probing representation forgetting in supervised and unsupervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16712–16721, 2022.
  5. Finetune like you pretrain: Improved finetuning of zero-shot vision models. ArXiv, abs/2212.00638, 2022. URL https://api.semanticscholar.org/CorpusID:254125206.
  6. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  8340–8349, 2021a.
  7. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15262–15271, 2021b.
  8. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021a.
  9. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021b. URL https://api.semanticscholar.org/CorpusID:235458009.
  10. Continual learning for text classification with information disentanglement based regularization. arXiv preprint arXiv:2104.05489, 2021.
  11. Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
  12. Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054, 2022.
  13. Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. ArXiv, abs/2301.06267, 2023. URL https://api.semanticscholar.org/CorpusID:255942320.
  14. Task-specific skill localization in fine-tuned language models. arXiv preprint arXiv:2302.06600, 2023.
  15. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  1406–1415, 2019.
  16. Simple and fast group robustness by automatic feature reweighting. arXiv preprint arXiv:2306.11074, 2023.
  17. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  18. Do imagenet classifiers generalize to imagenet? In International conference on machine learning, pp.  5389–5400. PMLR, 2019.
  19. Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856, 2022.
  20. Trainable projected gradient method for robust fine-tuning. ArXiv, abs/2303.10720, 2023. URL https://api.semanticscholar.org/CorpusID:257631710.
  21. Learning robust global representations by penalizing local predictive power. Advances in Neural Information Processing Systems, 32, 2019.
  22. Robust fine-tuning of zero-shot models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7949–7961, 2021. URL https://api.semanticscholar.org/CorpusID:237420687.
  23. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7959–7971, 2022.
  24. Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pp.  2825–2834. PMLR, 2018.
  25. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: