Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization (2403.10298v1)

Published 15 Mar 2024 in cs.CV

Abstract: Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FGVC performance. To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. In this network, to model the spatial contextual relationship between rich part descriptors and global semantics for capturing more discriminative details within the object, we design a novel multi-part and multi-scale cross-attention (MPMSCA) module. Before feeding to the MPMSCA module, the part navigator is developed to address the scale confusion problems and accurately identify the local distinctive regions. Furthermore, we propose a generic multi-level semantic quality evaluation module (MLSQE) to progressively supervise and enhance hierarchical semantics from different levels of the backbone network. Finally, context-aware features from MPMSCA and semantically enhanced features from MLSQE are fed into the corresponding quality probing classifiers to evaluate their quality in real-time, thus boosting the discriminability of feature representations. Comprehensive experiments on four popular and highly competitive FGVC datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” California Institute of Technology, Tech. Rep. CNS-TR-2011-001, 2011.
  2. G. Van Horn, S. Branson, R. Farrell, S. Haber, J. Barry, P. Ipeirotis, P. Perona, and S. Belongie, “Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 595–604.
  3. J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV) workshops, 2013, pp. 554–561.
  4. S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” arXiv preprint arXiv:1306.5151, 2013.
  5. T.-Y. Lin, A. RoyChowdhury, and S. Maji, “Bilinear cnn models for fine-grained visual recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 1449–1457.
  6. S. Cai, W. Zuo, and L. Zhang, “Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 511–520.
  7. H. Zheng, J. Fu, Z.-J. Zha, and J. Luo, “Learning deep bilinear transformation for fine-grained image representation,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019, pp. 4277–4286.
  8. G. Sun, H. Cholakkal, S. Khan, F. Khan, and L. Shao, “Fine-grained recognition: Accounting for subtle differences between similar classes,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 34, no. 07, 2020, pp. 12 047–12 054.
  9. P. Zhuang, Y. Wang, and Y. Qiao, “Learning attentive pairwise interaction for fine-grained classification,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 34, no. 07, 2020, pp. 13 130–13 137.
  10. Y. Ding, Z. Ma, S. Wen, J. Xie, D. Chang, Z. Si, M. Wu, and H. Ling, “Ap-cnn: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification,” IEEE Trans. Image Process., vol. 30, pp. 2826–2836, 2021.
  11. H. Huang, J. Zhang, L. Yu, J. Zhang, Q. Wu, and C. Xu, “Toan: Target-oriented alignment network for fine-grained image categorization with few labeled samples,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 853–866, 2022.
  12. M. Sun, Y. Yuan, F. Zhou, and E. Ding, “Multi-attention multi-class constraint for fine-grained image recognition,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 805–821.
  13. S. Wang, Z. Wang, H. Li, and W. Ouyang, “Category-specific nuance exploration network for fine-grained object retrieval,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 36, no. 3, 2022, pp. 2513–2521.
  14. J. Krause, H. Jin, J. Yang, and L. Fei-Fei, “Fine-grained recognition without part annotations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 5546–5555.
  15. S. Huang, Z. Xu, D. Tao, and Y. Zhang, “Part-stacked cnn for fine-grained visual categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 1173–1182.
  16. H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas, “Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 1143–1152.
  17. X. He, Y. Peng, and J. Zhao, “Fast fine-grained image classification via weakly supervised discriminative localization,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 5, pp. 1394–1407, 2019.
  18. Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, and L. Wang, “Learning to navigate for fine-grained classification,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 420–435.
  19. M. Liu, C. Zhang, H. Bai, R. Zhang, and Y. Zhao, “Cross-part learning for fine-grained image classification,” IEEE Trans. Image Process., vol. 31, pp. 748–758, 2022.
  20. C. Liu, H. Xie, Z.-J. Zha, L. Ma, L. Yu, and Y. Zhang, “Filtration and distillation: Enhancing region attention for fine-grained visual categorization,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 34, no. 07, 2020, pp. 11 555–11 562.
  21. Y. Zhao, J. Li, X. Chen, and Y. Tian, “Part-guided relational transformers for fine-grained visual recognition,” IEEE Trans. Image Process., vol. 30, pp. 9470–9481, 2021.
  22. J. Han, X. Yao, G. Cheng, X. Feng, and D. Xu, “P-cnn: Part-based convolutional neural networks for fine-grained visual categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 2, pp. 579–590, 2022.
  23. A. Behera, Z. Wharton, P. R. Hewage, and A. Bera, “Context-aware attentional pooling (cap) for fine-grained visual classification,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 35, no. 2, 2021, pp. 929–937.
  24. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 25, 2012, pp. 1097–1105.
  25. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 1–9.
  26. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778.
  27. R. Du, D. Chang, A. K. Bhunia, J. Xie, Z. Ma, Y.-Z. Song, and J. Guo, “Fine-grained visual classification via progressive multi-granularity training of jigsaw patches,” in Proc. Eur. Conf. Comput. Vis. (ECCV).   Springer, 2020, pp. 153–168.
  28. X. Yang, Y. Wang, K. Chen, Y. Xu, and Y. Tian, “Fine-grained object classification via self-supervised pose alignment,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 7399–7408.
  29. Y. Ding, Y. Zhou, Y. Zhu, Q. Ye, and J. Jiao, “Selective sparse sampling for fine-grained image recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6599–6608.
  30. C. Liu, H. Xie, Z. Zha, L. Yu, Z. Chen, and Y. Zhang, “Bidirectional attention-recognition model for fine-grained object classification,” IEEE Trans. Multimedia, vol. 22, no. 7, pp. 1785–1795, 2020.
  31. R. Ji, L. Wen, L. Zhang, D. Du, Y. Wu, C. Zhao, X. Liu, and F. Huang, “Attention convolutional binary neural tree for fine-grained visual categorization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 10 468–10 477.
  32. Y. Rao, G. Chen, J. Lu, and J. Zhou, “Counterfactual attention learning for fine-grained visual categorization and re-identification,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 1025–1034.
  33. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
  34. J. He, J.-N. Chen, S. Liu, A. Kortylewski, C. Yang, Y. Bai, and C. Wang, “Transfg: A transformer architecture for fine-grained recognition,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 36, no. 1, 2022, pp. 852–860.
  35. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 10 012–10 022.
  36. R. Ji, J. Li, L. Zhang, J. Liu, and Y. Wu, “Dual transformer with multi-grained assembly for fine-grained visual classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 9, pp. 5009–5021, 2023.
  37. Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2020, pp. 11 534–11 542.
  38. Y. Liang, L. Zhu, X. Wang, and Y. Yang, “A simple episodic linear probe improves visual recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 9559–9569.
  39. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  40. Y. Hu, Y. Yang, J. Zhang, X. Cao, and X. Zhen, “Attentional kernel encoding networks for fine-grained visual categorization,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 1, pp. 301–314, 2021.
  41. S. Wang, H. Li, Z. Wang, and W. Ouyang, “Dynamic position-aware network for fine-grained image recognition,” in Proc. AAAI Conf. Artif. Intell. (AAAI), vol. 35, no. 4, 2021, pp. 2791–2799.
  42. Y. Ding, Z. Han, Y. Zhou, Y. Zhu, J. Chen, Q. Ye, and J. Jiao, “Dynamic perception framework for fine-grained recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1353–1365, 2022.
  43. W. Deng, J. Marsh, S. Gould, and L. Zheng, “Fine-grained classification via categorical memory networks,” IEEE Trans. Image Process., vol. 31, pp. 4186–4196, 2022.
  44. L. Zhang, S. Huang, and W. Liu, “Enhancing mixture-of-experts by leveraging attention for fine-grained recognition,” IEEE Trans. Multimedia, vol. 24, pp. 4409–4421, 2022.
  45. X. Ke, Y. Cai, B. Chen, H. Liu, and W. Guo, “Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification,” Pattern Recognit., vol. 137, p. 109305, 2023.
  46. X. Guan, Y. Yang, J. Li, X. Zhu, J. Song, and H. T. Shen, “On the imaginary wings: Text-assisted complex-valued fusion network for fine-grained visual classification,” IEEE Trans. Neural Netw. Learn, Syst., vol. 34, no. 8, pp. 5112–5121, 2023.
  47. S. Kim, J. Nam, and B. C. Ko, “Vit-net: Interpretable vision transformers with neural tree decoder,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 162, 2022, pp. 11 162–11 172.
  48. H. Zhu, W. Ke, D. Li, J. Liu, L. Tian, and Y. Shan, “Dual cross-attention learning for fine-grained visual categorization and object re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 4692–4702.
  49. Q. Xu, J. Wang, B. Jiang, and B. Luo, “Fine-grained visual classification via internal ensemble learning transformer,” IEEE Trans. Multimedia, vol. 25, pp. 9015–9028, 2023.
  50. Q. Wang, J. Wang, H. Deng, X. Wu, Y. Wang, and G. Hao, “Aa-trans: Core attention aggregating transformer with information entropy selector for fine-grained visual classification,” Pattern Recognit., vol. 140, p. 109547, 2023.
  51. H. Liu, C. Zhang, Y. Deng, B. Xie, T. Liu, Z. Zhang, and Y.-F. Li, “Transifc: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification,” IEEE Trans. Multimedia, pp. 1–14, 2023.
  52. Z.-C. Zhang, Z.-D. Chen, Y. Wang, X. Luo, and X.-S. Xu, “A vision transformer for fine-grained classification by reducing noise and enhancing discriminative information,” Pattern Recognit., vol. 145, p. 109979, 2024.
  53. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2009, pp. 248–255.
  54. S. Wang, Z. Wang, H. Li, J. Chang, W. Ouyang, and Q. Tian, “Accurate fine-grained object recognition with structure-driven relation graph networks,” Int. J. Comput. Vis., pp. 137–160, 2023.
  55. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
  56. Y. Chen, Y. Bai, W. Zhang, and T. Mei, “Destruction and construction learning for fine-grained image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2019, pp. 5157–5166.
  57. S. Huang, X. Wang, and D. Tao, “Stochastic partial swap: Enhanced model generalization and interpretability for fine-grained recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 620–629.
  58. Z. Tang, H. Yang, and C. Y.-C. Chen, “Weakly supervised posture mining for fine-grained classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 23 735–23 744.
  59. S. Wang, Z. Wang, H. Li, J. Chang, W. Ouyang, and Q. Tian, “Semantic-guided information alignment network for fine-grained image recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 11, pp. 6558–6570, 2023.
  60. C. Wang, H. Fu, and H. Ma, “Multi-part token transformer with dual contrastive learning for fine-grained image classification,” in Proc. 31th ACM Int. Conf. Multimedia (ACM MM), 2023, pp. 7648–7656.
  61. Y. Zhao, K. Yan, F. Huang, and J. Li, “Graph-based high-order relation discovery for fine-grained recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 15 079–15 088.
  62. A. Dubey, O. Gupta, R. Raskar, and N. Naik, “Maximum-entropy fine grained classification,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 31, 2018, pp. 635–645.
  63. Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie, “Large scale fine-grained categorization and domain-specific transfer learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4109–4118.
  64. W. Luo, X. Yang, X. Mo, Y. Lu, L. S. Davis, J. Li, J. Yang, and S.-N. Lim, “Cross-x learning for fine-grained visual categorization,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 8242–8251.
  65. A. Bera, Z. Wharton, Y. Liu, N. Bessis, and A. Behera, “Sr-gnn: Spatial relation-aware graph neural network for fine-grained image categorization,” IEEE Trans. Image Process., vol. 31, pp. 6017–6031, 2022.
  66. L. Zhu, T. Chen, J. Yin, S. See, and J. Liu, “Learning gabor texture features for fine-grained recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2023, pp. 1621–1631.
  67. H. Chen, H. Zhang, C. Liu, J. An, Z. Gao, and J. Qiu, “Fet-fgvc: Feature-enhanced transformer for fine-grained visual classification,” Pattern Recognit., vol. 149, p. 110265, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qin Xu (35 papers)
  2. Sitong Li (5 papers)
  3. Jiahui Wang (46 papers)
  4. Bo Jiang (235 papers)
  5. Jinhui Tang (111 papers)

Summary

We haven't generated a summary for this paper yet.