Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization (2403.10298v1)
Abstract: Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FGVC performance. To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. In this network, to model the spatial contextual relationship between rich part descriptors and global semantics for capturing more discriminative details within the object, we design a novel multi-part and multi-scale cross-attention (MPMSCA) module. Before feeding to the MPMSCA module, the part navigator is developed to address the scale confusion problems and accurately identify the local distinctive regions. Furthermore, we propose a generic multi-level semantic quality evaluation module (MLSQE) to progressively supervise and enhance hierarchical semantics from different levels of the backbone network. Finally, context-aware features from MPMSCA and semantically enhanced features from MLSQE are fed into the corresponding quality probing classifiers to evaluate their quality in real-time, thus boosting the discriminability of feature representations. Comprehensive experiments on four popular and highly competitive FGVC datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods.
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” California Institute of Technology, Tech. Rep. CNS-TR-2011-001, 2011.
- G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona, and S. Belongie, “Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 595–604.
- J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) workshops, 2013, pp. 554–561.
- S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” arXiv preprint arXiv:1306.5151, 2013.
- T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear cnn models for fine-grained visual recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 1449–1457.
- S. Cai, W. Zuo, and L. Zhang, “Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 511–520.
- H. Zheng, J. Fu, Z.-J. Zha, and J. Luo, “Learning deep bilinear transformation for fine-grained image representation,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019, pp. 4277–4286.
- G. Sun, H. Cholakkal, S. Khan, F. Khan, and L. Shao, “Fine-grained recognition: Accounting for subtle differences between similar classes,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 34, no. 07, 2020, pp. 12 047–12 054.
- P. Zhuang, Y. Wang, and Y. Qiao, “Learning attentive pairwise interaction for fine-grained classification,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 34, no. 07, 2020, pp. 13 130–13 137.
- Y. Ding, Z. Ma, S. Wen, J. Xie, D. Chang, Z. Si, M. Wu, and H. Ling, “Ap-cnn: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification,” IEEE Trans. Image Process., vol. 30, pp. 2826–2836, 2021.
- H. Huang, J. Zhang, L. Yu, J. Zhang, Q. Wu, and C. Xu, “Toan: Target-oriented alignment network for fine-grained image categorization with few labeled samples,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 853–866, 2022.
- M. Sun, Y. Yuan, F. Zhou, and E. Ding, “Multi-attention multi-class constraint for fine-grained image recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 805–821.
- S. Wang, Z. Wang, H. Li, and W. Ouyang, “Category-specific nuance exploration network for fine-grained object retrieval,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 36, no. 3, 2022, pp. 2513–2521.
- J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 5546–5555.
- S. Huang, Z. Xu, D. Tao, and Y. Zhang, “Part-stacked cnn for fine-grained visual categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 1173–1182.
- H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas, “Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 1143–1152.
- X. He, Y. Peng, and J. Zhao, “Fast fine-grained image classification via weakly supervised discriminative localization,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 5, pp. 1394–1407, 2019.
- Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, and L. Wang, “Learning to navigate for fine-grained classification,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 420–435.
- M. Liu, C. Zhang, H. Bai, R. Zhang, and Y. Zhao, “Cross-part learning for fine-grained image classification,” IEEE Trans. Image Process., vol. 31, pp. 748–758, 2022.
- C. Liu, H. Xie, Z.-J. Zha, L. Ma, L. Yu, and Y. Zhang, “Filtration and distillation: Enhancing region attention for fine-grained visual categorization,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 34, no. 07, 2020, pp. 11 555–11 562.
- Y. Zhao, J. Li, X. Chen, and Y. Tian, “Part-guided relational transformers for fine-grained visual recognition,” IEEE Trans. Image Process., vol. 30, pp. 9470–9481, 2021.
- J. Han, X. Yao, G. Cheng, X. Feng, and D. Xu, “P-cnn: Part-based convolutional neural networks for fine-grained visual categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 2, pp. 579–590, 2022.
- A. Behera, Z. Wharton, P. R. Hewage, and A. Bera, “Context-aware attentional pooling (cap) for fine-grained visual classification,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 35, no. 2, 2021, pp. 929–937.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 25, 2012, pp. 1097–1105.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1–9.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778.
- R. Du, D. Chang, A. K. Bhunia, J. Xie, Z. Ma, Y.-Z. Song, and J. Guo, “Fine-grained visual classification via progressive multi-granularity training of jigsaw patches,” in Proc. Eur. Conf. Comput. Vis. (ECCV). Springer, 2020, pp. 153–168.
- X. Yang, Y. Wang, K. Chen, Y. Xu, and Y. Tian, “Fine-grained object classification via self-supervised pose alignment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 7399–7408.
- Y. Ding, Y. Zhou, Y. Zhu, Q. Ye, and J. Jiao, “Selective sparse sampling for fine-grained image recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6599–6608.
- C. Liu, H. Xie, Z. Zha, L. Yu, Z. Chen, and Y. Zhang, “Bidirectional attention-recognition model for fine-grained object classification,” IEEE Trans. Multimedia, vol. 22, no. 7, pp. 1785–1795, 2020.
- R. Ji, L. Wen, L. Zhang, D. Du, Y. Wu, C. Zhao, X. Liu, and F. Huang, “Attention convolutional binary neural tree for fine-grained visual categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 10 468–10 477.
- Y. Rao, G. Chen, J. Lu, and J. Zhou, “Counterfactual attention learning for fine-grained visual categorization and re-identification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 1025–1034.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
- J. He, J.-N. Chen, S. Liu, A. Kortylewski, C. Yang, Y. Bai, and C. Wang, “Transfg: A transformer architecture for fine-grained recognition,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 36, no. 1, 2022, pp. 852–860.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 10 012–10 022.
- R. Ji, J. Li, L. Zhang, J. Liu, and Y. Wu, “Dual transformer with multi-grained assembly for fine-grained visual classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 5009–5021, 2023.
- Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 11 534–11 542.
- Y. Liang, L. Zhu, X. Wang, and Y. Yang, “A simple episodic linear probe improves visual recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 9559–9569.
- J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
- Y. Hu, Y. Yang, J. Zhang, X. Cao, and X. Zhen, “Attentional kernel encoding networks for fine-grained visual categorization,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 1, pp. 301–314, 2021.
- S. Wang, H. Li, Z. Wang, and W. Ouyang, “Dynamic position-aware network for fine-grained image recognition,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 35, no. 4, 2021, pp. 2791–2799.
- Y. Ding, Z. Han, Y. Zhou, Y. Zhu, J. Chen, Q. Ye, and J. Jiao, “Dynamic perception framework for fine-grained recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1353–1365, 2022.
- W. Deng, J. Marsh, S. Gould, and L. Zheng, “Fine-grained classification via categorical memory networks,” IEEE Trans. Image Process., vol. 31, pp. 4186–4196, 2022.
- L. Zhang, S. Huang, and W. Liu, “Enhancing mixture-of-experts by leveraging attention for fine-grained recognition,” IEEE Trans. Multimedia, vol. 24, pp. 4409–4421, 2022.
- X. Ke, Y. Cai, B. Chen, H. Liu, and W. Guo, “Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification,” Pattern Recognit., vol. 137, p. 109305, 2023.
- X. Guan, Y. Yang, J. Li, X. Zhu, J. Song, and H. T. Shen, “On the imaginary wings: Text-assisted complex-valued fusion network for fine-grained visual classification,” IEEE Trans. Neural Netw. Learn, Syst., vol. 34, no. 8, pp. 5112–5121, 2023.
- S. Kim, J. Nam, and B. C. Ko, “Vit-net: Interpretable vision transformers with neural tree decoder,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 162, 2022, pp. 11 162–11 172.
- H. Zhu, W. Ke, D. Li, J. Liu, L. Tian, and Y. Shan, “Dual cross-attention learning for fine-grained visual categorization and object re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 4692–4702.
- Q. Xu, J. Wang, B. Jiang, and B. Luo, “Fine-grained visual classification via internal ensemble learning transformer,” IEEE Trans. Multimedia, vol. 25, pp. 9015–9028, 2023.
- Q. Wang, J. Wang, H. Deng, X. Wu, Y. Wang, and G. Hao, “Aa-trans: Core attention aggregating transformer with information entropy selector for fine-grained visual classification,” Pattern Recognit., vol. 140, p. 109547, 2023.
- H. Liu, C. Zhang, Y. Deng, B. Xie, T. Liu, Z. Zhang, and Y.-F. Li, “Transifc: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification,” IEEE Trans. Multimedia, pp. 1–14, 2023.
- Z.-C. Zhang, Z.-D. Chen, Y. Wang, X. Luo, and X.-S. Xu, “A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information,” Pattern Recognit., vol. 145, p. 109979, 2024.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2009, pp. 248–255.
- S. Wang, Z. Wang, H. Li, J. Chang, W. Ouyang, and Q. Tian, “Accurate fine-grained object recognition with structure-driven relation graph networks,” Int. J. Comput. Vis., pp. 137–160, 2023.
- I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
- Y. Chen, Y. Bai, W. Zhang, and T. Mei, “Destruction and construction learning for fine-grained image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 5157–5166.
- S. Huang, X. Wang, and D. Tao, “Stochastic partial swap: Enhanced model generalization and interpretability for fine-grained recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 620–629.
- Z. Tang, H. Yang, and C. Y.-C. Chen, “Weakly supervised posture mining for fine-grained classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 23 735–23 744.
- S. Wang, Z. Wang, H. Li, J. Chang, W. Ouyang, and Q. Tian, “Semantic-guided information alignment network for fine-grained image recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 11, pp. 6558–6570, 2023.
- C. Wang, H. Fu, and H. Ma, “Multi-part token transformer with dual contrastive learning for fine-grained image classification,” in Proc. 31th ACM Int. Conf. Multimedia (ACM MM), 2023, pp. 7648–7656.
- Y. Zhao, K. Yan, F. Huang, and J. Li, “Graph-based high-order relation discovery for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 15 079–15 088.
- A. Dubey, O. Gupta, R. Raskar, and N. Naik, “Maximum-entropy fine grained classification,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 31, 2018, pp. 635–645.
- Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie, “Large scale fine-grained categorization and domain-specific transfer learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4109–4118.
- W. Luo, X. Yang, X. Mo, Y. Lu, L. S. Davis, J. Li, J. Yang, and S.-N. Lim, “Cross-x learning for fine-grained visual categorization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 8242–8251.
- A. Bera, Z. Wharton, Y. Liu, N. Bessis, and A. Behera, “Sr-gnn: Spatial relation-aware graph neural network for fine-grained image categorization,” IEEE Trans. Image Process., vol. 31, pp. 6017–6031, 2022.
- L. Zhu, T. Chen, J. Yin, S. See, and J. Liu, “Learning gabor texture features for fine-grained recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2023, pp. 1621–1631.
- H. Chen, H. Zhang, C. Liu, J. An, Z. Gao, and J. Qiu, “Fet-fgvc: Feature-enhanced transformer for fine-grained visual classification,” Pattern Recognit., vol. 149, p. 110265, 2024.
- Qin Xu (35 papers)
- Sitong Li (5 papers)
- Jiahui Wang (46 papers)
- Bo Jiang (235 papers)
- Jinhui Tang (111 papers)