Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning (2407.20600v1)
Abstract: Image recognition is an essential baseline for deep metric learning. Hierarchical knowledge about image classes depicts inter-class similarities or dissimilarities. Effective fusion of hierarchical knowledge about image classes to enhance image recognition remains a challenging topic to advance. In this paper, we propose a novel deep metric learning based method to effectively fuse hierarchical prior knowledge about image classes and enhance image recognition performances in an end-to-end supervised regression manner. Existing deep metric learning incorporated image classification mainly exploits qualitative relativity between image classes, i.e., whether sampled images are from the same class. A new triplet loss function term that exploits quantitative relativity and aligns distances in model latent space with those in knowledge space is also proposed and incorporated in the proposed dual-modality fusion method. Experimental results indicate that the proposed method enhanced image recognition performances and outperformed baseline and existing methods on CIFAR-10, CIFAR-100, Mini-ImageNet, and ImageNet-1K datasets.
- A. Zhai and H.-Y. Wu, “Classification is a strong baseline for deep metric learning,” in British Machine Vision Conference, 2019.
- J. Xie, J. Xiang, J. Chen, X. Hou, X. Zhao, and L. Shen, “C2 am: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 979–988.
- M. Sun, W. Huang, and S. Savarese, “Find the best path: An efficient and accurate classifier for image hierarchies,” in IEEE International Conference on Computer Vision, 2013, pp. 265–272.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, no. 2, 2012.
- Y. Lecun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989.
- H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “Cvt: Introducing convolutions to vision transformers,” in IEEE International Conference on Computer Vision, 2021, pp. 22–31.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
- B. Liu, R. Li, and J. Feng, “A brief introduction to deep metric learning,” CAAI Transactions on Intelligent Systems, vol. 14, no. 6, pp. 1064–1072, 2019.
- K. Song, J. Han, G. Cheng, J. Lu, and F. Nie, “Adaptive neighborhood metric learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4591–4604, 2022.
- M. Kaya and H. Bilge, “Deep metric learning: A survey,” Symmetry, vol. 11, no. 9, 2019.
- J. Deng, A. C. Berg, and L. Fei-Fei, “Hierarchical semantic indexing for large scale image retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 785–792.
- W. Zheng, Y. Huang, B. Zhang, J. Zhou, and J. Lu, “Dynamic metric learning with cross-level concept distillation,” in European Conference on Computer Vision. Springer Nature Switzerland, 2022, pp. 197–213.
- A. Bellet, A. Habrard, and M. Sebban, “Metric learning,” vol. 30. Springer Science and Business Media LLC, 2015, pp. 1–151.
- Y. Qu, L. Lin, F. Shen, C. Lu, Y. Wu, Y. Xie, and D. Tao, “Joint hierarchical category structure learning and large-scale image classification,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4331–4346, 2017.
- X. Ma, H. Wang, Y. Liu, S. Ji, Q. Gao, and J. Wang, “Knowledge guided classification of hyperspectral image based on hierarchical class tree,” in IEEE International Geoscience and Remote Sensing Symposium, 2019, pp. 2702–2705.
- Y. Zheng, J. Fan, J. Zhang, and X. Gao, “Exploiting related and unrelated tasks for hierarchical metric learning and image classification,” IEEE Transactions on Image Processing, vol. 29, pp. 883–896, 2020.
- Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European Conference on Computer Vision. Springer International Publishing, 2016, pp. 499–515.
- B. Kulis, “Metric learning: A survey,” Foundations and trends in machine learning, vol. 5, no. 4, pp. 287–364, 2013.
- S. Ji, Z. Zhang, S. Ying, L. Wang, X. Zhao, and Y. Gao, “Kullback-leibler divergence metric learning,” IEEE Transactions on Cybernetics, vol. 52, no. 4, pp. 2047–2058, 2022.
- M. G. Schultz and T. Joachims, “Learning a Distance Metric from Relative Comparisons,” Neural Information Processing Systems, vol. 16, pp. 41–48, 12 2003.
- M. T. Law, N. Thome, and M. Cord, “Learning a distance metric from relative comparisons between quadruplets of images,” International Journal of Computer Vision, vol. 121, no. 1, pp. 65–94, 2017.
- T. Endo and M. Matsumoto, “Aurora image classification with deep metric learning,” Sensors, vol. 22, no. 17, 2022.
- D. Wu, S. Li, Z. Zang, and S. Z. Li, “Exploring localization for self-supervised fine-grained contrastive learning,” in British Machine Vision Conference, 2022.
- E. Xie, J. Ding, W. Wang, X. Zhan, H. Xu, P. Sun, Z. Li, and P. Luo, “Detco: Unsupervised contrastive learning for object detection,” in IEEE International Conference on Computer Vision, 2021, pp. 8372–8381.
- F. Haghighi, M. R. H. Taher, M. B. Gotway, and J. Liang, “Dira: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 792–20 802.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9726–9735.
- Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3733–3742.
- D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: An overview of methods, challenges, and prospects,” Proceedings of the IEEE, vol. 103, no. 9, pp. 1449–1477, 2015.
- S. Karaoglu, R. Tao, T. Gevers, and A. W. M. Smeulders, “Words matter: Scene text for image classification and retrieval,” IEEE Transactions on Multimedia, vol. 19, no. 5, pp. 1063–1076, 2017.
- Y. Su and F. Jurie, “Improving image classification using semantic attributes,” International Journal of Computer Vision, vol. 100, no. 1, pp. 59–77, 2012.
- H. Cai, V. W. Zheng, and K. C.-C. Chang, “A comprehensive survey of graph embedding: Problems, techniques, and applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp. 1616–1637, 2018.
- L. Fan, X. Sun, and P. L. Rosin, “Siamese graph convolution network for face sketch recognition: An application using graph structure for face photo-sketch recognition,” in International Conference on Pattern Recognition, 2021, pp. 8008–8014.
- Y. Wang, Z. Yu, J. Wang, Q. Heng, H. Chen, W. Ye, R. Xie, X. Xie, and S. Zhang, “Exploring vision-language models for imbalanced learning,” International Journal of Computer Vision, vol. 132, no. 1, pp. 224–237, 2024.
- M. Yuan, N. Lv, Y. Xie, F. Lu, and K. Zhan, “Clip-fg:selecting discriminative image patches by contrastive language-image pre-training for fine-grained image classification,” in IEEE International Conference on Image Processing, 2023, pp. 560–564.
- J. Fu, S. Xu, H. Liu, Y. Liu, N. Xie, C.-C. Wang, J. Liu, Y. Sun, and B. Wang, “Cma-clip: Cross-modality attention clip for text-image classification,” in IEEE International Conference on Image Processing, 2022, pp. 2846–2850.
- D. Wang and K. Mao, “Learning semantic text features for web text-aided image classification,” IEEE Transactions on Multimedia, vol. 21, no. 12, pp. 2985–2996, 2019.
- G. Waltner, M. Opitz, H. Possegger, and H. Bischof, “Hibster: Hierarchical boosted deep metric learning for image retrieval,” in IEEE Winter Conference on Applications of Computer Vision, 2019, pp. 599–608.
- S. Kim, B. Jeong, and S. Kwak, “Hier: Metric learning beyond class labels via hierarchical regularization,” in IEEE Conference on Computer Vision and Pattern Recognition, June 2023, pp. 19 903–19 912.
- S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in International Conference on International Conference on Machine Learning, 2015, pp. 448–456.
- A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009.
- O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
- G. A. Miller, “Wordnet: a lexical database for english,” Communications of The ACM, vol. 38, no. 11, pp. 39–41, 11 1995.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” CoRR, vol. abs/1605.07146, 2016.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 1 2014.
- I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017.
- P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2021.
- J. Choe, S. J. Oh, S. Chun, S. Lee, Z. Akata, and H. Shim, “Evaluation for weakly supervised object localization: Protocol, metrics, and datasets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1732–1748, 2023.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in IEEE International Conference on Computer Vision, 2017, pp. 618–626.
- Yunfeng Zhao (8 papers)
- Huiyu Zhou (109 papers)
- Fei Wu (317 papers)
- Xifeng Wu (2 papers)