Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval (2403.00272v1)

Published 1 Mar 2024 in cs.CV, cs.IR, and cs.LG

Abstract: In the context of pose-invariant object recognition and retrieval, we demonstrate that it is possible to achieve significant improvements in performance if both the category-based and the object-identity-based embeddings are learned simultaneously during training. In hindsight, that sounds intuitive because learning about the categories is more fundamental than learning about the individual objects that correspond to those categories. However, to the best of what we know, no prior work in pose-invariant learning has demonstrated this effect. This paper presents an attention-based dual-encoder architecture with specially designed loss functions that optimize the inter- and intra-class distances simultaneously in two different embedding spaces, one for the category embeddings and the other for the object-level embeddings. The loss functions we have proposed are pose-invariant ranking losses that are designed to minimize the intra-class distances and maximize the inter-class distances in the dual representation spaces. We demonstrate the power of our approach with three challenging multi-view datasets, ModelNet-40, ObjectPI, and FG3D. With our dual approach, for single-view object recognition, we outperform the previous best by 20.0% on ModelNet40, 2.0% on ObjectPI, and 46.5% on FG3D. On the other hand, for single-view object retrieval, we outperform the previous best by 33.7% on ModelNet40, 18.8% on ObjectPI, and 56.9% on FG3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Beyond triplet loss: A deep quadruplet network for person re-identification. In Computer Vision and Pattern Recognition (CVPR), pages 1320–1329, 2017.
  2. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition (CVPR), pages 539–546 vol. 1, 2005.
  3. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
  4. Deep metric learning with hierarchical triplet loss. In European Conference on Computer Vision (ECCV), pages 272–288. Springer International Publishing, 2018.
  5. Smart mining for deep metric learning. In International Conference on Computer Vision (ICCV), pages 2840–2848, Los Alamitos, CA, USA, 2017. IEEE Computer Society.
  6. Triplet-center loss for multi-view 3d object retrieval. Computer Vision and Pattern Recognition (CVPR), 2018.
  7. Pies: Pose invariant embeddings. In Computer Vision and Pattern Recognition (CVPR), 2019.
  8. Deep metric learning using triplet network. Lecture Notes in Computer Science, page 84–92, 2015.
  9. Local similarity-aware deep feature embedding. In Neural Information Processing Systems (NeurIPS), page 1270–1278, Red Hook, NY, USA, 2016. Curran Associates Inc.
  10. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. Computer Vision and Pattern Recognition (CVPR), pages 5010–5019, 2018.
  11. Large-margin softmax loss for convolutional neural networks. In International Conference on Machine Learning (ICML), page 507–516. JMLR.org, 2016.
  12. Fine-grained 3d shape classification with hierarchical part-view attentions. IEEE Transactions on Image Processing, 2021.
  13. Umap: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29):861, 2018.
  14. No fuss distance metric learning using proxies. In International Conference on Computer Vision (ICCV), pages 360–368, 2017.
  15. Dan: Deep-attention network for 3d shape recognition. IEEE Trans. on Image Processing, 30:4371–4383, 2021.
  16. Deep face recognition. In British Machine Vision Conference (BMVC), pages 41.1–41.12. BMVA Press, 2015.
  17. Outfittransformer: Outfit representations for fashion recommendation. In Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2263–2267, 2022.
  18. Outfittransformer: Learning outfit representations for fashion recommendation. In Winter Conference on Applications of Computer Vision (WACV), pages 3601–3609, 2023.
  19. Facenet: A unified embedding for face recognition and clustering. Computer Vision and Pattern Recognition (CVPR), 2015.
  20. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015.
  21. Deep metric learning via lifted structured feature embedding. In Computer Vision and Pattern Recognition (CVPR), pages 4004–4012, 2016.
  22. Multi-view convolutional neural networks for 3d shape recognition. In International Conference on Computer Vision (ICCV), 2015.
  23. Stochastic class-based hard example mining for deep metric learning. In Computer Vision and Pattern Recognition (CVPR), pages 7244–7252, 2019.
  24. Cross-batch memory for embedding learning. In Computer Vision and Pattern Recognition (CVPR), pages 6388–6397, 2020.
  25. Learning canonical view representation for 3d shape recognition with arbitrary views. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 397–406, Los Alamitos, CA, USA, 2021. IEEE Computer Society.
  26. 3d shapenets: A deep representation for volumetric shapes. In Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
  27. Hard negative examples are hard, but useful. In European Conference on Computer Vision (ECCV), pages 126–142. Springer International Publishing, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.