Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning (2407.20600v1)

Published 30 Jul 2024 in cs.CV

Abstract: Image recognition is an essential baseline for deep metric learning. Hierarchical knowledge about image classes depicts inter-class similarities or dissimilarities. Effective fusion of hierarchical knowledge about image classes to enhance image recognition remains a challenging topic to advance. In this paper, we propose a novel deep metric learning based method to effectively fuse hierarchical prior knowledge about image classes and enhance image recognition performances in an end-to-end supervised regression manner. Existing deep metric learning incorporated image classification mainly exploits qualitative relativity between image classes, i.e., whether sampled images are from the same class. A new triplet loss function term that exploits quantitative relativity and aligns distances in model latent space with those in knowledge space is also proposed and incorporated in the proposed dual-modality fusion method. Experimental results indicate that the proposed method enhanced image recognition performances and outperformed baseline and existing methods on CIFAR-10, CIFAR-100, Mini-ImageNet, and ImageNet-1K datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. A. Zhai and H.-Y. Wu, “Classification is a strong baseline for deep metric learning,” in British Machine Vision Conference, 2019.
  2. J. Xie, J. Xiang, J. Chen, X. Hou, X. Zhao, and L. Shen, “C2 am: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 979–988.
  3. M. Sun, W. Huang, and S. Savarese, “Find the best path: An efficient and accurate classifier for image hierarchies,” in IEEE International Conference on Computer Vision, 2013, pp. 265–272.
  4. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  5. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, no. 2, 2012.
  6. Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989.
  7. H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “Cvt: Introducing convolutions to vision transformers,” in IEEE International Conference on Computer Vision, 2021, pp. 22–31.
  8. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  9. O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
  10. B. Liu, R. Li, and J. Feng, “A brief introduction to deep metric learning,” CAAI Transactions on Intelligent Systems, vol. 14, no. 6, pp. 1064–1072, 2019.
  11. K. Song, J. Han, G. Cheng, J. Lu, and F. Nie, “Adaptive neighborhood metric learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4591–4604, 2022.
  12. M. Kaya and H. Bilge, “Deep metric learning: A survey,” Symmetry, vol. 11, no. 9, 2019.
  13. J. Deng, A. C. Berg, and L. Fei-Fei, “Hierarchical semantic indexing for large scale image retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 785–792.
  14. W. Zheng, Y. Huang, B. Zhang, J. Zhou, and J. Lu, “Dynamic metric learning with cross-level concept distillation,” in European Conference on Computer Vision.   Springer Nature Switzerland, 2022, pp. 197–213.
  15. A. Bellet, A. Habrard, and M. Sebban, “Metric learning,” vol. 30.   Springer Science and Business Media LLC, 2015, pp. 1–151.
  16. Y. Qu, L. Lin, F. Shen, C. Lu, Y. Wu, Y. Xie, and D. Tao, “Joint hierarchical category structure learning and large-scale image classification,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4331–4346, 2017.
  17. X. Ma, H. Wang, Y. Liu, S. Ji, Q. Gao, and J. Wang, “Knowledge guided classification of hyperspectral image based on hierarchical class tree,” in IEEE International Geoscience and Remote Sensing Symposium, 2019, pp. 2702–2705.
  18. Y. Zheng, J. Fan, J. Zhang, and X. Gao, “Exploiting related and unrelated tasks for hierarchical metric learning and image classification,” IEEE Transactions on Image Processing, vol. 29, pp. 883–896, 2020.
  19. Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European Conference on Computer Vision.   Springer International Publishing, 2016, pp. 499–515.
  20. B. Kulis, “Metric learning: A survey,” Foundations and trends in machine learning, vol. 5, no. 4, pp. 287–364, 2013.
  21. S. Ji, Z. Zhang, S. Ying, L. Wang, X. Zhao, and Y. Gao, “Kullback-leibler divergence metric learning,” IEEE Transactions on Cybernetics, vol. 52, no. 4, pp. 2047–2058, 2022.
  22. M. G. Schultz and T. Joachims, “Learning a Distance Metric from Relative Comparisons,” Neural Information Processing Systems, vol. 16, pp. 41–48, 12 2003.
  23. M. T. Law, N. Thome, and M. Cord, “Learning a distance metric from relative comparisons between quadruplets of images,” International Journal of Computer Vision, vol. 121, no. 1, pp. 65–94, 2017.
  24. T. Endo and M. Matsumoto, “Aurora image classification with deep metric learning,” Sensors, vol. 22, no. 17, 2022.
  25. D. Wu, S. Li, Z. Zang, and S. Z. Li, “Exploring localization for self-supervised fine-grained contrastive learning,” in British Machine Vision Conference, 2022.
  26. E. Xie, J. Ding, W. Wang, X. Zhan, H. Xu, P. Sun, Z. Li, and P. Luo, “Detco: Unsupervised contrastive learning for object detection,” in IEEE International Conference on Computer Vision, 2021, pp. 8372–8381.
  27. F. Haghighi, M. R. H. Taher, M. B. Gotway, and J. Liang, “Dira: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 792–20 802.
  28. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9726–9735.
  29. Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3733–3742.
  30. D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: An overview of methods, challenges, and prospects,” Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, 2015.
  31. S. Karaoglu, R. Tao, T. Gevers, and A. W. M. Smeulders, “Words matter: Scene text for image classification and retrieval,” IEEE Transactions on Multimedia, vol. 19, no. 5, pp. 1063–1076, 2017.
  32. Y. Su and F. Jurie, “Improving image classification using semantic attributes,” International Journal of Computer Vision, vol. 100, no. 1, pp. 59–77, 2012.
  33. H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey of graph embedding: Problems, techniques, and applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp. 1616–1637, 2018.
  34. L. Fan, X. Sun, and P. L. Rosin, “Siamese graph convolution network for face sketch recognition: An application using graph structure for face photo-sketch recognition,” in International Conference on Pattern Recognition, 2021, pp. 8008–8014.
  35. Y. Wang, Z. Yu, J. Wang, Q. Heng, H. Chen, W. Ye, R. Xie, X. Xie, and S. Zhang, “Exploring vision-language models for imbalanced learning,” International Journal of Computer Vision, vol. 132, no. 1, pp. 224–237, 2024.
  36. M. Yuan, N. Lv, Y. Xie, F. Lu, and K. Zhan, “Clip-fg:selecting discriminative image patches by contrastive language-image pre-training for fine-grained image classification,” in IEEE International Conference on Image Processing, 2023, pp. 560–564.
  37. J. Fu, S. Xu, H. Liu, Y. Liu, N. Xie, C.-C. Wang, J. Liu, Y. Sun, and B. Wang, “Cma-clip: Cross-modality attention clip for text-image classification,” in IEEE International Conference on Image Processing, 2022, pp. 2846–2850.
  38. D. Wang and K. Mao, “Learning semantic text features for web text-aided image classification,” IEEE Transactions on Multimedia, vol. 21, no. 12, pp. 2985–2996, 2019.
  39. G. Waltner, M. Opitz, H. Possegger, and H. Bischof, “Hibster: Hierarchical boosted deep metric learning for image retrieval,” in IEEE Winter Conference on Applications of Computer Vision, 2019, pp. 599–608.
  40. S. Kim, B. Jeong, and S. Kwak, “Hier: Metric learning beyond class labels via hierarchical regularization,” in IEEE Conference on Computer Vision and Pattern Recognition, June 2023, pp. 19 903–19 912.
  41. S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in International Conference on International Conference on Machine Learning, 2015, pp. 448–456.
  42. A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009.
  43. O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
  44. G. A. Miller, “Wordnet: a lexical database for english,” Communications of The ACM, vol. 38, no. 11, pp. 39–41, 11 1995.
  45. S. Zagoruyko and N. Komodakis, “Wide residual networks,” CoRR, vol. abs/1605.07146, 2016.
  46. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 1 2014.
  47. I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017.
  48. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2021.
  49. J. Choe, S. J. Oh, S. Chun, S. Lee, Z. Akata, and H. Shim, “Evaluation for weakly supervised object localization: Protocol, metrics, and datasets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1732–1748, 2023.
  50. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in IEEE International Conference on Computer Vision, 2017, pp. 618–626.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yunfeng Zhao (8 papers)
  2. Huiyu Zhou (109 papers)
  3. Fei Wu (317 papers)
  4. Xifeng Wu (2 papers)

Summary

We haven't generated a summary for this paper yet.