TFS-ViT: Token-Level Feature Stylization for Domain Generalization (2303.15698v3)
Abstract: Standard deep learning models such as convolutional neural networks (CNNs) lack the ability of generalizing to domains which have not been seen during training. This problem is mainly due to the common but often wrong assumption of such models that the source and target data come from the same i.i.d. distribution. Recently, Vision Transformers (ViTs) have shown outstanding performance for a broad range of computer vision tasks. However, very few studies have investigated their ability to generalize to new domains. This paper presents a first Token-level Feature Stylization (TFS-ViT) approach for domain generalization, which improves the performance of ViTs to unseen data by synthesizing new domains. Our approach transforms token features by mixing the normalization statistics of images from different domains. We further improve this approach with a novel strategy for attention-aware stylization, which uses the attention maps of class (CLS) tokens to compute and mix normalization statistics of tokens corresponding to different image regions. The proposed method is flexible to the choice of backbone model and can be easily applied to any ViT-based architecture with a negligible increase in computational complexity. Comprehensive experiments show that our approach is able to achieve state-of-the-art performance on five challenging benchmarks for domain generalization, and demonstrate its ability to deal with different types of domain shifts. The implementation is available at: https://github.com/Mehrdad-Noori/TFS-ViT_Token-level_Feature_Stylization.
- B. Recht, R. Roelofs, L. Schmidt, and V. Shankar, “Do imagenet classifiers generalize to imagenet?” in International Conference on Machine Learning. PMLR, 2019, pp. 5389–5400.
- D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” arXiv preprint arXiv:1903.12261, 2019.
- Z. Lu, Y. Yang, X. Zhu, C. Liu, Y.-Z. Song, and T. Xiang, “Stochastic classifiers for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9111–9120.
- K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, “Maximum classifier discrepancy for unsupervised domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3723–3732.
- G. Blanchard, G. Lee, and C. Scott, “Generalizing from several related classification tasks to a new unlabeled sample,” Advances in neural information processing systems, vol. 24, pp. 2178–2186, 2011.
- K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- J. Wang, C. Lan, C. Liu, Y. Ouyang, T. Qin, W. Lu, Y. Chen, W. Zeng, and P. Yu, “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and Data Engineering, 2022.
- S. Hu, K. Zhang, Z. Chen, and L. Chan, “Domain generalization via multidomain discriminant analysis,” in Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 292–302.
- D. Mahajan, S. Tople, and A. Sharma, “Domain generalization using causal matching,” in International Conference on Machine Learning. PMLR, 2021, pp. 7313–7324.
- H. Li, Y. Wang, R. Wan, S. Wang, T.-Q. Li, and A. Kot, “Domain generalization for medical imaging classification with linear-dependency regularization,” Advances in Neural Information Processing Systems, vol. 33, pp. 3118–3129, 2020.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Y. Balaji, S. Sankaranarayanan, and R. Chellappa, “Metareg: Towards domain generalization using meta-regularization,” Advances in neural information processing systems, vol. 31, 2018.
- Y. Shi, X. Yu, K. Sohn, M. Chandraker, and A. K. Jain, “Towards universal representation learning for deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6817–6826.
- S. Shankar, V. Piratla, S. Chakrabarti, S. Chaudhuri, P. Jyothi, and S. Sarawagi, “Generalizing across domains via cross-gradient training,” in International Conference on Learning Representations, 2018.
- K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Domain generalization with mixstyle,” in International Conference on Learning Representations, 2021.
- F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi, “Domain generalization by solving jigsaw puzzles,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2229–2238.
- I. Albuquerque, N. Naik, J. Li, N. Keskar, and R. Socher, “Improving out-of-distribution generalization via multi-task self-supervised pretraining,” arXiv preprint arXiv:2003.13525, 2020.
- Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in European Conference on Computer Vision. Springer, 2020, pp. 124–140.
- J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- M. Sultana, M. Naseer, M. H. Khan, S. Khan, and F. S. Khan, “Self-distilled vision transformer for domain generalization,” arXiv preprint arXiv:2207.12392, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. K. Choi, J. Choi, and H. J. Kim, “Tokenmixup: Efficient attention-guided token-level data augmentation for transformers,” arXiv preprint arXiv:2210.07562, 2022.
- X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang, “Moment matching for multi-source domain adaptation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1406–1415.
- H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain Generalization with Adversarial Feature Learning,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 5400–5409.
- S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto, “Unified deep supervised domain adaptation and generalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5715–5725.
- Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao, “Deep domain generalization via conditional invariant adversarial networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 624–639.
- F. M. Carlucci, P. Russo, T. Tommasi, and B. Caputo, “Hallucinating Agnostic Images to Generalize Across Domains,” in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct. 2019, pp. 3227–3234.
- K. Zhou, Y. Yang, T. Hospedales, and T. Xiang, “Learning to Generate Novel Domains for Domain Generalization,” in Computer Vision – ECCV 2020, ser. Lecture Notes in Computer Science, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 561–578.
- Q. Xu, R. Zhang, Z. Fan, Y. Wang, Y.-Y. Wu, and Y. Zhang, “Fourier-based augmentation with applications to domain generalization,” Pattern Recognition, vol. 139, p. 109474, 2023.
- X. Yue, Y. Zhang, S. Zhao, A. Sangiovanni-Vincentelli, K. Keutzer, and B. Gong, “Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2100–2110.
- X. Huang and S. Belongie, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization,” arXiv:1703.06868 [cs], Jul. 2017.
- N. Somavarapu, C.-Y. Ma, and Z. Kira, “Frustratingly Simple Domain Generalization via Image Stylization,” arXiv:2006.11207 [cs], Jul. 2020.
- M. Mancini, Z. Akata, E. Ricci, and B. Caputo, “Towards Recognizing Unseen Categories in Unseen Domains,” arXiv:2007.12256 [cs], Aug. 2020.
- K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “MixStyle Neural Networks for Domain Generalization and Adaptation,” arXiv:2107.02053 [cs], Jul. 2021.
- M. Mancini, S. R. Bulò, B. Caputo, and E. Ricci, “Best sources forward: domain generalization through source-specific nets,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 1353–1357.
- D. Kim, Y. Yoo, S. Park, J. Kim, and J. Lee, “Selfreg: Self-supervised contrastive regularization for domain generalization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 9619–9628.
- S. Wang, L. Yu, C. Li, C.-W. Fu, and P.-A. Heng, “Learning from extrinsic and intrinsic supervisions for domain generalization,” 2020.
- M. Ghifary, W. Bastiaan Kleijn, M. Zhang, and D. Balduzzi, “Domain generalization for object recognition with multi-task autoencoders,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2551–2559.
- J. Zhang, X.-Y. Zhang, C. Wang, and C.-L. Liu, “Deep representation learning for domain generalization with information bottleneck principle,” Pattern Recognition, p. 109737, 2023.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5542–5550.
- P. Chattopadhyay, Y. Balaji, and J. Hoffman, “Learning to balance specificity and invariance for in and out of domain generalization,” 2020.
- S. Sagawa, P. W. Koh, T. B. Hashimoto, and P. Liang, “Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization,” arXiv preprint arXiv:1911.08731, 2019.
- I. Gulrajani and D. Lopez-Paz, “In search of lost domain generalization,” ArXiv, vol. abs/2007.01434, 2021.
- X. Dai, Y. Chen, J. Yang, P. Zhang, L. Yuan, and L. Zhang, “Dynamic detr: End-to-end object detection with dynamic attention,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 2988–2997.
- R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7262–7272.
- C. Zhang, M. Zhang, S. Zhang, D. Jin, Q. Zhou, Z. Cai, H. Zhao, S. Yi, X. Liu, and Z. Liu, “Delving deep into the generalization of vision transformers under distribution shifts,” arXiv preprint arXiv:2106.07617, 2021.
- X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1501–1510.
- M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz, “Invariant risk minimization,” arXiv preprint arXiv:1907.02893, 2019.
- S. Yan, H. Song, N. Li, L. Zou, and L. Ren, “Improve unsupervised domain adaptation with mixup training,” arXiv preprint arXiv:2001.00677, 2020.
- B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European conference on computer vision. Springer, 2016, pp. 443–450.
- H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5400–5409.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2096–2030, 2016.
- G. Blanchard, A. A. Deshmukh, U. Dogan, G. Lee, and C. Scott, “Domain generalization by marginal transfer learning,” arXiv preprint arXiv:1711.07910, 2017.
- H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 8690–8699.
- M. Zhang, H. Marklund, N. Dhawan, A. Gupta, S. Levine, and C. Finn, “Adaptive risk minimization: Learning to adapt to domain shift,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- D. Krueger, E. Caballero, J.-H. Jacobsen, A. Zhang, J. Binas, D. Zhang, R. Le Priol, and A. Courville, “Out-of-distribution generalization via risk extrapolation (rex),” in International Conference on Machine Learning. PMLR, 2021, pp. 5815–5826.
- M.-H. Bui, T. Tran, A. Tran, and D. Phung, “Exploiting domain-specific features to enhance domain generalization,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning. PMLR, 2021, pp. 10 347–10 357.
- L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z.-H. Jiang, F. E. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 558–567.
- C. Fang, Y. Xu, and D. N. Rockmore, “Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1657–1664.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5018–5027.
- S. Beery, G. Van Horn, and P. Perona, “Recognition in terra incognita,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 456–473.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
- Mehrdad Noori (16 papers)
- Milad Cheraghalikhani (11 papers)
- Ali Bahri (14 papers)
- Gustavo A. Vargas Hakim (4 papers)
- David Osowiechi (12 papers)
- Ismail Ben Ayed (133 papers)
- Christian Desrosiers (75 papers)