Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Renderers are Good Zero-Shot Representation Learners: Exploring Diffusion Latents for Metric Learning (2306.10721v1)

Published 19 Jun 2023 in cs.CV

Abstract: Can the latent spaces of modern generative neural rendering models serve as representations for 3D-aware discriminative visual understanding tasks? We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E, including capturing view-independence and enabling the aggregation of scene representations from the representations of individual image views, and find that Shap-E representations outperform those of the classical EfficientNet baseline representations zero-shot, and is still competitive when both methods are trained using a contrative loss. These findings give preliminary indication that 3D-based rendering and generative models can yield useful representations for discriminative tasks in our innately 3D-native world. Our code is available at \url{https://github.com/michaelwilliamtang/golden-retriever}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Vqa: Visual question answering, 2016.
  2. Deep convolutional neural network based autonomous drone navigation. In Thirteenth International Conference on Machine Vision, volume 11605, pages 16–24. SPIE, 2021.
  3. Shapenet: An information-rich 3d model repository, 2015.
  4. A simple framework for contrastive learning of visual representations, 2020.
  5. A generalist framework for panoptic segmentation of images and videos. arXiv preprint arXiv:2210.06366, 2022.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  7. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2019.
  8. Diffusiondepth: Diffusion denoising approach for monocular depth estimation. ArXiv, abs/2303.05021, 2023.
  9. Diffusioninst: Diffusion model for instance segmentation. ArXiv, abs/2212.02773, 2022.
  10. Shap-e: Generating conditional 3d implicit functions, 2023.
  11. Learning multiple layers of features from tiny images. 2009.
  12. Melon: Nerf with unposed images using equivalence class estimation, 2023.
  13. Zero-1-to-3: Zero-shot one image to 3d object, 2023.
  14. Sphereface: Deep hypersphere embedding for face recognition, 2018.
  15. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020.
  16. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
  17. Zero-shot text-to-image generation, 2021.
  18. FaceNet: A unified embedding for face recognition and clustering. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2015.
  19. Thalles Santos Silva. Exploring simclr: A simple framework for contrastive learning of visual representations. https://sthalles.github.io, 2020.
  20. Efficientnet: Rethinking model scaling for convolutional neural networks, 2020.
  21. A discriminative feature learning approach for deep face recognition. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 499–515, Cham, 2016. Springer International Publishing.
  22. Lilian Weng. Contrastive representation learning. lilianweng.github.io, May 2021.
  23. Diffusion models for implicit image segmentation ensembles. In International Conference on Medical Imaging with Deep Learning, pages 1336–1348. PMLR, 2022.
  24. Medsegdiff: Medical image segmentation with diffusion probabilistic model. arXiv preprint arXiv:2211.00611, 2022.
  25. pixelnerf: Neural radiance fields from one or few images, 2021.

Summary

We haven't generated a summary for this paper yet.