Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization (2311.15145v3)

Published 26 Nov 2023 in cs.CV

Abstract: Domain Generalization (DG), a crucial research area, seeks to train models across multiple domains and test them on unseen ones. In this paper, we introduce a novel approach, namely, Selective Cross-Modality Distillation for Domain Generalization (SCMD). SCMD leverages the capabilities of large vision-LLMs, specifically CLIP, to train a more efficient model, ensuring it acquires robust generalization capabilities across unseen domains. Our primary contribution is a unique selection framework strategically designed to identify hard-to-learn samples for distillation. In parallel, we introduce a novel cross-modality module that seamlessly combines the projected features of the student model with the text embeddings from CLIP, ensuring the alignment of similarity distributions. We assess SCMD's performance on various benchmarks, where it empowers a ResNet50 to deliver state-of-the-art performance, surpassing existing domain generalization methods. Furthermore, we provide a theoretical analysis of our selection strategy, offering deeper insight into its effectiveness and potential in the field of DG.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. Adversarial invariant feature learning with accuracy constraint for domain generalization. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part II, pp.  315–331. Springer, 2020.
  2. Invariant risk minimization. ArXiv preprint, abs/1907.02893, 2019. URL https://arxiv.org/abs/1907.02893.
  3. Ensemble of averages: Improving model selection and boosting performance in domain generalization. Advances in Neural Information Processing Systems, 35:8265–8277, 2022.
  4. Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV), pp.  456–473, 2018.
  5. Analysis of representations for domain adaptation. In Bernhard Schölkopf, John C. Platt, and Thomas Hofmann (eds.), Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006, pp.  137–144. MIT Press, 2006. URL https://proceedings.neurips.cc/paper/2006/hash/b1b0432ceafb0ce714426e9114852ac7-Abstract.html.
  6. A theory of learning from different domains. Machine learning, 79:151–175, 2010.
  7. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  8. What is the effect of importance weighting in deep learning? In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp.  872–881. PMLR, 2019. URL http://proceedings.mlr.press/v97/byrd19a.html.
  9. SWAD: domain generalization by seeking flat minima. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  22405–22418, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/bcb41ccdc4363c6848a1d760f26c28a0-Abstract.html.
  10. Domain generalization by mutual-information regularization with pre-trained models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, pp.  440–457. Springer, 2022.
  11. Active bias: Training more accurate neural networks by emphasizing high variance samples. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp.  1002–1012, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/2f37d10131f2a483a8dd005b3d14b0d9-Abstract.html.
  12. Learning student networks via feature embedding. IEEE Transactions on Neural Networks and Learning Systems, 32(1):25–35, 2020.
  13. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Sheila A. McIlraith and Kilian Q. Weinberger (eds.), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp.  2852–2859. AAAI Press, 2018. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17147.
  14. A survey of model compression and acceleration for deep neural networks. ArXiv preprint, abs/1710.09282, 2017. URL https://arxiv.org/abs/1710.09282.
  15. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp.  248–255. IEEE Computer Society, 2009. doi: 10.1109/CVPR.2009.5206848. URL https://doi.org/10.1109/CVPR.2009.5206848.
  16. Domain generalization by learning and removing domain-specific features. ArXiv preprint, abs/2212.07101, 2022. URL https://arxiv.org/abs/2212.07101.
  17. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pp.  1657–1664. IEEE Computer Society, 2013. doi: 10.1109/ICCV.2013.208. URL https://doi.org/10.1109/ICCV.2013.208.
  18. Supervised adversarial alignment of single-cell rna-seq data. Journal of Computational Biology, 28(5):501–513, 2021.
  19. DLOW: domain flow for adaptation and generalization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp.  2477–2486. Computer Vision Foundation / IEEE, 2019. doi: 10.1109/CVPR.2019.00258. URL http://openaccess.thecvf.com/content_CVPR_2019/html/Gong_DLOW_Domain_Flow_for_Adaptation_and_Generalization_CVPR_2019_paper.html.
  20. In search of lost domain generalization. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=lQdXeXDoWtI.
  21. Learning domain invariant representations in goal-conditioned block mdps. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp.  764–776, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/06d172404821f7d01060cc9629171b2e-Abstract.html.
  22. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp.  770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR.2016.90.
  23. A comprehensive overhaul of feature distillation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp.  1921–1930. IEEE, 2019a. doi: 10.1109/ICCV.2019.00201. URL https://doi.org/10.1109/ICCV.2019.00201.
  24. Knowledge distillation with adversarial samples supporting decision boundary. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp.  3771–3778. AAAI Press, 2019b. doi: 10.1609/aaai.v33i01.33013771. URL https://doi.org/10.1609/aaai.v33i01.33013771.
  25. Distilling the knowledge in a neural network. ArXiv preprint, abs/1503.02531, 2015. URL https://arxiv.org/abs/1503.02531.
  26. FSDR: frequency space domain randomization for domain generalization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp.  6891–6902. Computer Vision Foundation / IEEE, 2021. doi: 10.1109/CVPR46437.2021.00682. URL https://openaccess.thecvf.com/content/CVPR2021/html/Huang_FSDR_Frequency_Space_Domain_Randomization_for_Domain_Generalization_CVPR_2021_paper.html.
  27. Like what you like: Knowledge distill via neuron selectivity transfer. ArXiv preprint, abs/1707.01219, 2017. URL https://arxiv.org/abs/1707.01219.
  28. Self-challenging improves cross-domain generalization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp.  124–140. Springer, 2020.
  29. The two dimensions of worst-case training and the integrated effect for out-of-domain generalization. ArXiv preprint, abs/2204.04384, 2022. URL https://arxiv.org/abs/2204.04384.
  30. A sentence speaks a thousand images: Domain generalization through distilling clip with language guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  11685–11695, 2023.
  31. Scaling up visual and vision-language representation learning with noisy text supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  4904–4916. PMLR, 2021. URL http://proceedings.mlr.press/v139/jia21b.html.
  32. Not all samples are created equal: Deep learning with importance sampling. In Jennifer G. Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp.  2530–2539. PMLR, 2018. URL http://proceedings.mlr.press/v80/katharopoulos18a.html.
  33. Paraphrasing complex network: Network compression via factor transfer. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp.  2765–2774, 2018a. URL https://proceedings.neurips.cc/paper/2018/hash/6d9cb7de5e8ac30bd5e8734bc96a35c1-Abstract.html.
  34. Paraphrasing complex network: Network compression via factor transfer. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp.  2765–2774, 2018b. URL https://proceedings.neurips.cc/paper/2018/hash/6d9cb7de5e8ac30bd5e8734bc96a35c1-Abstract.html.
  35. Out-of-distribution generalization via risk extrapolation (rex). In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  5815–5826. PMLR, 2021. URL http://proceedings.mlr.press/v139/krueger21a.html.
  36. Cross-domain ensemble distillation for domain generalization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXV, pp.  1–20. Springer, 2022.
  37. Deeper, broader and artier domain generalization. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp.  5543–5551. IEEE Computer Society, 2017. doi: 10.1109/ICCV.2017.591. URL https://doi.org/10.1109/ICCV.2017.591.
  38. Learning to generalize: Meta-learning for domain generalization. In Sheila A. McIlraith and Kilian Q. Weinberger (eds.), Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp.  3490–3497. AAAI Press, 2018a. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16067.
  39. Domain generalization with adversarial feature learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp.  5400–5409. IEEE Computer Society, 2018b. doi: 10.1109/CVPR.2018.00566. URL http://openaccess.thecvf.com/content_cvpr_2018/html/Li_Domain_Generalization_With_CVPR_2018_paper.html.
  40. Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European conference on computer vision (ECCV), pp.  624–639, 2018c.
  41. Domain generalization using pretrained models without fine-tuning. ArXiv preprint, abs/2203.04600, 2022. URL https://arxiv.org/abs/2203.04600.
  42. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.  2980–2988, 2017.
  43. Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives. arXiv preprint arXiv:2307.16851, 2023.
  44. Knowledge representing: efficient, sparse representation of prior knowledge for knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  0–0, 2019.
  45. Cross-architecture knowledge distillation. In Proceedings of the Asian Conference on Computer Vision, pp.  3396–3411, 2022.
  46. Domain-invariant feature exploration for domain generalization. ArXiv preprint, abs/2207.12020, 2022. URL https://arxiv.org/abs/2207.12020.
  47. Attention diversification for domain generalization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIV, pp.  322–340. Springer, 2022.
  48. Domain generalization via invariant feature representation. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, volume 28 of JMLR Workshop and Conference Proceedings, pp.  10–18. JMLR.org, 2013. URL http://proceedings.mlr.press/v28/muandet13.html.
  49. Relational knowledge distillation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp.  3967–3976. Computer Vision Foundation / IEEE, 2019. doi: 10.1109/CVPR.2019.00409. URL http://openaccess.thecvf.com/content_CVPR_2019/html/Park_Relational_Knowledge_Distillation_CVPR_2019_paper.html.
  50. Heterogeneous knowledge distillation using information flow modeling. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp.  2336–2345. IEEE, 2020a. doi: 10.1109/CVPR42600.2020.00241. URL https://doi.org/10.1109/CVPR42600.2020.00241.
  51. Probabilistic knowledge transfer for lightweight deep representation learning. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2030–2039, 2020b.
  52. ALP-KD: attention-based layer projection for knowledge distillation. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp.  13657–13665. AAAI Press, 2021. URL https://ojs.aaai.org/index.php/AAAI/article/view/17610.
  53. Moment matching for multi-source domain adaptation. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp.  1406–1415. IEEE, 2019. doi: 10.1109/ICCV.2019.00149. URL https://doi.org/10.1109/ICCV.2019.00149.
  54. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  8748–8763. PMLR, 2021. URL http://proceedings.mlr.press/v139/radford21a.html.
  55. Recycling diverse models for out-of-distribution generalization. ArXiv preprint, abs/2212.10445, 2022. URL https://arxiv.org/abs/2212.10445.
  56. Fitnets: Hints for thin deep nets. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015a. URL http://arxiv.org/abs/1412.6550.
  57. Fitnets: Hints for thin deep nets. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015b. URL http://arxiv.org/abs/1412.6550.
  58. Generalizing across domains via cross-gradient training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=r1Dx7fbCW.
  59. Deep coral: Correlation alignment for deep domain adaptation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pp.  443–450. Springer, 2016.
  60. A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III 27, pp.  270–279. Springer, 2018.
  61. Adaptive methods for aggregated domain generalization. ArXiv preprint, abs/2112.04766, 2021. URL https://arxiv.org/abs/2112.04766.
  62. Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.
  63. Deep hashing network for unsupervised domain adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp.  5385–5394. IEEE Computer Society, 2017. doi: 10.1109/CVPR.2017.572. URL https://doi.org/10.1109/CVPR.2017.572.
  64. Select-additive learning: Improving generalization in multimodal sentiment analysis. In 2017 IEEE International Conference on Multimedia and Expo (ICME), pp.  949–954. IEEE, 2017.
  65. Out-of-distribution generalization with causal invariant transformations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp.  375–385. IEEE, 2022a. doi: 10.1109/CVPR52688.2022.00047. URL https://doi.org/10.1109/CVPR52688.2022.00047.
  66. Exclusivity-consistency regularized knowledge distillation for face recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pp.  325–342. Springer, 2020.
  67. Domain generalization via shuffled style assembly for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4123–4133, 2022b.
  68. Feature normalized knowledge distillation for image classification. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp.  664–680. Springer, 2020.
  69. Unified contrastive learning in image-text-label space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19163–19173, 2022a.
  70. Vision-language pre-training with triple contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15671–15680, 2022b.
  71. FILIP: fine-grained interactive language-image pre-training. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=cpDhcsEDC2.
  72. Ood-bench: Benchmarking and understanding out-of-distribution generalization datasets and algorithms. ArXiv preprint, abs/2106.03721, 2021. URL https://arxiv.org/abs/2106.03721.
  73. Learning visual representation from modality-shared contrastive language-image pre-training. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, pp.  69–87. Springer, 2022.
  74. Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp.  2100–2110. IEEE, 2019. doi: 10.1109/ICCV.2019.00219. URL https://doi.org/10.1109/ICCV.2019.00219.
  75. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=Sks9_ajex.
  76. Domain prompt learning for efficiently adapting clip to unseen domains. arXiv e-prints, pp.  arXiv–2111, 2021.
  77. Deep mutual learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp.  4320–4328. IEEE Computer Society, 2018. doi: 10.1109/CVPR.2018.00454. URL http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Deep_Mutual_Learning_CVPR_2018_paper.html.
  78. Deep domain-adversarial image generation for domain generalisation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp.  13025–13032. AAAI Press, 2020. URL https://aaai.org/ojs/index.php/AAAI/article/view/7003.
  79. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.

Summary

We haven't generated a summary for this paper yet.