Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANs (2401.02411v1)

Published 4 Jan 2024 in cs.CV, cs.AI, cs.GR, and cs.LG

Abstract: 3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries of scenes from collections of 2D images via neural volume rendering. Yet, the significant memory and computational costs of dense sampling in volume rendering have forced 3D GANs to adopt patch-based training or employ low-resolution rendering with post-processing 2D super resolution, which sacrifices multiview consistency and the quality of resolved geometry. Consequently, 3D GANs have not yet been able to fully resolve the rich 3D geometry present in 2D images. In this work, we propose techniques to scale neural volume rendering to the much higher resolution of native 2D images, thereby resolving fine-grained 3D geometry with unprecedented detail. Our approach employs learning-based samplers for accelerating neural rendering for 3D GAN training using up to 5 times fewer depth samples. This enables us to explicitly "render every pixel" of the full-resolution image during training and inference without post-processing superresolution in 2D. Together with our strategy to learn high-quality surface geometry, our method synthesizes high-resolution 3D geometry and strictly view-consistent images while maintaining image quality on par with baselines relying on post-processing super resolution. We demonstrate state-of-the-art 3D gemetric quality on FFHQ and AFHQ, setting a new standard for unsupervised learning of 3D shapes in 3D GANs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Coalition for content provenance and authenticity. https://c2pa.org/.
  2. Content authenticity initiative. https://contentauthenticity.org/.
  3. Panohead: Geometry-aware 3d full-head synthesis in 360deg. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  4. Sal: Sign agnostic learning of shapes from raw data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2565–2574, 2020.
  5. Nerd: Neural reflectance decomposition from image collections. In IEEE International Conference on Computer Vision (ICCV), 2021.
  6. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  7. Efficient geometry-aware 3D generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  8. GeNVS: Generative novel view synthesis with 3D-aware diffusion models. In IEEE International Conference on Computer Vision (ICCV), 2023.
  9. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023a.
  10. Mimic3d: Thriving 3d-aware gans via 3d-to-2d imitation. arXiv preprint arXiv:2303.09036, 2023b.
  11. Stargan v2: Diverse image synthesis for multiple domains. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  12. On the detection of synthetic images generated by diffusion models. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  13. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5203–5212, 2020.
  14. Gram: Generative radiance manifolds for 3d-aware image generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  15. Ag3d: Learning to generate 3d avatars from 2d image collections. arXiv preprint arXiv:2305.02312, 2023.
  16. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023.
  17. Learning neural parametric head models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21003–21012, 2023.
  18. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), 2014.
  19. StyleNeRF: A style-based 3D-aware generator for high-resolution image synthesis. arXiv preprint arXiv:2110.08985, 2021.
  20. Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In International Conference on Machine Learning, pages 11808–11826. PMLR, 2023.
  21. Mcnerf: Monte carlo rendering and denoising for real-time nerfs. In ACM SIGGRAPH Asia 2023 Conference Proceedings, 2023.
  22. GANs trained by a two time-scale update rule converge to a nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  23. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
  24. Holodiffusion: Training a 3D diffusion model using 2D images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  25. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  26. Training generative adversarial networks with limited data. In Advances in Neural Information Processing Systems (NeurIPS), 2020a.
  27. Analyzing and improving the image quality of StyleGAN. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020b.
  28. Alias-free generative adversarial networks. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  29. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 2023.
  30. Adanerf: Adaptive sampling for real-time rendering of neural radiance fields. 2022.
  31. Nerfacc: Efficient sampling accelerates nerfs. arXiv preprint arXiv:2305.04966, 2023a.
  32. Neuralangelo: High-fidelity neural surface reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023b.
  33. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  34. Efficient neural radiance fields for interactive free-viewpoint video. In ACM Transactions on Graphics (SIGGRAPH ASIA), 2022.
  35. AutoInt: Automatic integration for fast neural volume rendering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  36. N. Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics (TVCG), 1995.
  37. Which training methods for gans do actually converge? In International conference on machine learning, pages 3481–3490. PMLR, 2018.
  38. NeRF: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV), 2020.
  39. Don P Mitchell. Consequences of stratified sampling in graphics. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 277–280, 1996.
  40. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  41. DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks. Computer Graphics Forum, 2021.
  42. Point-e: A system for generating 3d point clouds from complex prompts, 2022.
  43. GIRAFFE: Representing scenes as compositional generative neural feature fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  44. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In IEEE International Conference on Computer Vision (ICCV), 2021.
  45. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  46. Terminerf: Ray termination prediction for efficient neural rendering. In International Conference on 3D Vision (3DV), 2021.
  47. Dreamfusion: Text-to-3d using 2d diffusion. International Conference on Learning Representations (ICLR), 2022.
  48. Avatar fingerprinting for authorized use of synthetic talking-head videos. arXiv preprint arXiv:2305.03713, 2023.
  49. GRAF: Generative radiance fields for 3D-aware image synthesis. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  50. Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  51. Implicit neural representations with periodic activation functions. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  52. Light field networks: Neural scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems, 34:19313–19325, 2021.
  53. Epigraf: Rethinking training of 3d gans. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  54. 3d generation on imagenet. arXiv preprint arXiv:2303.01416, 2023.
  55. Viewset diffusion: (0-)image-conditioned 3D generative models from 2D data. In ICCV, 2023.
  56. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  57. Compact neural graphics primitives with learned hash probing. In ACM SIGGRAPH Asia 2023 Conference Proceedings, 2023.
  58. Diffusion with forward models: Solving stochastic inverse problems without direct supervision. Advances in Neural Information Processing Systems (NeurIPS), 2023.
  59. Real-time radiance fields for single-image portrait view synthesis. In ACM Transactions on Graphics (SIGGRAPH), 2023.
  60. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  61. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023a.
  62. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Advances in Neural Information Processing Systems (NeurIPS), 2021.
  63. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023b.
  64. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In IEEE International Conference on Computer Vision (ICCV), 2023c.
  65. Adaptive shells for efficient neural radiance field rendering. In ACM Transactions on Graphics (SIGGRAPH ASIA).
  66. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023d.
  67. Gram-hd: 3d-consistent image generation at high resolution with generative radiance manifolds. arXiv preprint arXiv:2206.07255, 2022.
  68. Neural fields in visual computing and beyond. In Computer Graphics Forum. Wiley Online Library, 2022.
  69. 3d-aware image synthesis via learning structural and textural representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  70. Giraffe hd: A high-resolution 3d-aware generative model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  71. Multiview neural surface reconstruction by disentangling geometry and appearance. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  72. Volume rendering of neural implicit surfaces. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  73. PlenOctrees for real-time rendering of neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 2021.
  74. Multi-view consistent generative adversarial networks for 3d-aware image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  75. Generative multiplane images: Making a 2d gan 3d-aware. In European Conference on Computer Vision (ECCV), 2022.
  76. CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis. arXiv preprint arXiv:2110.09788, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Alex Trevithick (8 papers)
  2. Matthew Chan (7 papers)
  3. Towaki Takikawa (13 papers)
  4. Umar Iqbal (50 papers)
  5. Shalini De Mello (45 papers)
  6. Manmohan Chandraker (108 papers)
  7. Ravi Ramamoorthi (65 papers)
  8. Koki Nagano (27 papers)
Citations (5)

Summary

  • The paper introduces an SDF-based 3D GAN architecture that improves high-resolution 3D geometry rendering by reducing depth sample requirements up to fivefold.
  • It employs learning-based samplers to efficiently select scene regions, ensuring consistent multi-view outputs without relying on super-resolution techniques.
  • The method achieves state-of-the-art view-consistent results on datasets like FFHQ and AFHQ, advancing unsupervised 3D shape learning.

Introduction

Generative Adversarial Networks (GANs) have advanced dramatically over the last decade, especially in the field of image synthesis. One of the intriguing developments in this area is 3D-aware GANs, which can learn to recreate 3D scenes and geometries from 2D image collections. These 3D GANs rely on neural volume rendering techniques, but they have historically struggled with high computational and memory demands. This has limited their applications, making it challenging to achieve high-resolution outputs that maintain both geometric detail and multi-view consistency. A recent approach, however, presents a novel method to overcome these limitations, allowing for full-resolution rendering that captures fine-grained 3D geometry details without sacrificing image quality.

Scaling Challenges in 3D GANs

Previously, 3D GANs encountered roadblocks in scaling to high resolutions due to the intense resource requirements of volume rendering. For example, rendering a 512x512 pixel image might necessitate evaluating tens of millions of depth samples, demanding an impractical amount of GPU memory. To cope, researchers often used patch-based approaches or combined low-resolution neural rendering with 2D super-resolution (SR) techniques. Unfortunately, these workarounds compromised the consistency between different views and did not fully resolve 3D details.

Innovations in High-Fidelity Rendering

The new approach directly addresses the core issues that have held back high-resolution rendering in 3D GANs. The researchers present a set of techniques that constitute an end-to-end pipeline:

  • SDF-based 3D GAN Architecture: Instead of relying on traditional volume rendering methods, the model uses a Signed Distance Function (SDF) based representation to encapsulate high-frequency geometry. This method rightly focuses on increasing the quality of surface geometry through training.
  • Learning-based Samplers: By incorporating learning-based samplers into the rendering process, the model efficiently determines which parts of the scene need to be rendered at full resolution, reducing the number of depth samples needed by up to five times.
  • Robust Sampling Strategy: The proposed sampling strategy ensures stable rendering with significantly fewer depth samples, maintaining image quality without resorting to super-resolution post-processing.

Impressive Results

Empirical demonstrations show that this methodology generates state-of-the-art 3D geometry quality, as tested on standard datasets such as FFHQ and AFHQ. Not only does the model produce strictly view-consistent images, but it does so with a level of detail previously unseen in unsupervised learning of 3D shapes in GANs. The results outperform existing methods in image quality, while also achieving unprecedented levels of geometric detail.

Conclusion

The proposed research represents a significant leap in the field of 3D-aware GANs, bridging the gap between 2D image quality and 3D geometric accuracy. With these advancements, theses GANs are poised to power a range of applications from content creation to novel view synthesis, providing tools that could revolutionize industries reliant on 3D modeling and visualization. As technology progresses, methods like these will continue to push the boundaries of what's possible in the synergy between artificial intelligence and 3D graphics.