Papers
Topics
Authors
Recent
2000 character limit reached

Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning (2402.12886v1)

Published 20 Feb 2024 in cs.GR

Abstract: Rendering photo-realistic novel-view images of complex scenes has been a long-standing challenge in computer graphics. In recent years, great research progress has been made on enhancing rendering quality and accelerating rendering speed in the realm of view synthesis. However, when rendering complex dynamic scenes with sparse views, the rendering quality remains limited due to occlusion problems. Besides, for rendering high-resolution images on dynamic scenes, the rendering speed is still far from real-time. In this work, we propose a generalizable view synthesis method that can render high-resolution novel-view images of complex static and dynamic scenes in real-time from sparse views. To address the occlusion problems arising from the sparsity of input views and the complexity of captured scenes, we introduce an explicit 3D visibility reasoning approach that can efficiently estimate the visibility of sampled 3D points to the input views. The proposed visibility reasoning approach is fully differentiable and can gracefully fit inside the volume rendering pipeline, allowing us to train our networks with only multi-view images as supervision while refining geometry and texture simultaneously. Besides, each module in our pipeline is carefully designed to bypass the time-consuming MLP querying process and enhance the rendering quality of high-resolution images, enabling us to render high-resolution novel-view images in real-time.Experimental results show that our method outperforms previous view synthesis methods in both rendering quality and speed, particularly when dealing with complex dynamic scenes with sparse views.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision, 2020, pp. 405–421.
  2. A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer, “D-NeRF: Neural radiance fields for dynamic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. [Online]. Available: http://arxiv.org/abs/2011.13961v1
  3. B. Attal, J.-B. Huang, C. Richardt, M. Zollhoefer, J. Kopf, M. O’Toole, and C. Kim, “Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 610–16 620.
  4. C. Gao, A. Saraf, J. Kopf, and J.-B. Huang, “Dynamic view synthesis from dynamic monocular video,” arXiv preprint arXiv:2105.06468, 2021.
  5. A. Yu, V. Ye, M. Tancik, and A. Kanazawa, “PixelNeRF: Neural radiance fields from one or few images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4578–4587.
  6. Q. Wang, Z. Wang, K. Genova, P. P. Srinivasan, H. Zhou, J. T. Barron, R. Martin-Brualla, N. Snavely, and T. Funkhouser, “IBRNet: Learning multi-view image-based rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4690–4699.
  7. H. Lin, S. Peng, Z. Xu, Y. Yan, Q. Shuai, H. Bao, and X. Zhou, “Efficient neural radiance fields for interactive free-viewpoint video,” in SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9.
  8. Y. Liu, S. Peng, L. Liu, Q. Wang, P. Wang, C. Theobalt, X. Zhou, and W. Wang, “Neural rays for occlusion-aware image-based rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7824–7833.
  9. A. Chen, Z. Xu, F. Zhao, X. Zhang, F. Xiang, J. Yu, and H. Su, “MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo,” in Proceedings of the IEEE International Conference on Computer Vision, 2021.
  10. S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. Valentin, “FastNeRF: High-fidelity neural rendering at 200fps,” arXiv preprint arXiv:2103.10380, 2021. [Online]. Available: http://arxiv.org/abs/2103.10380v2
  11. P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec, “Baking neural radiance fields for real-time view synthesis,” arXiv preprint arXiv:2103.14645, 2021. [Online]. Available: http://arxiv.org/abs/2103.14645v1
  12. Z. Chen, T. Funkhouser, P. Hedman, and A. Tagliasacchi, “Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures,” in The Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  13. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” arXiv preprint arXiv:2201.05989, 2022.
  14. R. Jensen, A. Dahl, G. Vogiatzis, E. Tola, and H. Aanæs, “Large scale multi-view stereopsis evaluation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 406–413.
  15. B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–14, 2019.
  16. Z. Wang, L. Li, Z. Shen, L. Shen, and L. Bo, “4k-nerf: High fidelity neural radiance fields at ultra high resolutions,” arXiv preprint arXiv:2212.04701, 2022.
  17. J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan, “Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields,” in Proceedings of the IEEE International Conference on Computer Vision, 2021. [Online]. Available: http://arxiv.org/abs/2103.13415v3
  18. J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5470–5479.
  19. K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” arXiv preprint arXiv:2107.02791, 2021.
  20. B. Roessle, J. T. Barron, B. Mildenhall, P. P. Srinivasan, and M. Nießner, “Dense depth priors for neural radiance fields from sparse input views,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 892–12 901.
  21. D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-nerf: Structured view-dependent appearance for neural radiance fields supplemental material.”
  22. J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise view selection for unstructured multi-view stereo,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14.   Springer, 2016, pp. 501–518.
  23. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition, 2016.
  24. K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, and R. Martin-Brualla, “Nerfies: Deformable neural radiance fields,” in Proceedings of the IEEE International Conference on Computer Vision, 2021. [Online]. Available: http://arxiv.org/abs/2011.12948v5
  25. L. Liu, J. Gu, K. Z. Lin, T.-S. Chua, and C. Theobalt, “Neural sparse voxel fields,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020. [Online]. Available: http://arxiv.org/abs/2007.11571v2
  26. C. Reiser, S. Peng, Y. Liao, and A. Geiger, “Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps,” in Proceedings of the IEEE International Conference on Computer Vision, 2021. [Online]. Available: http://arxiv.org/abs/2103.13744v2
  27. D. Rebain, W. Jiang, S. Yazdani, K. Li, K. M. Yi, and A. Tagliasacchi, “Derf: Decomposed radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 153–14 161.
  28. Z. Li, S. Niklaus, N. Snavely, and O. Wang, “Neural scene flow fields for space-time view synthesis of dynamic scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6498–6508.
  29. Y. Du, Y. Zhang, H.-X. Yu, J. B. Tenenbaum, and J. Wu, “Neural radiance flow for 4D view synthesis and video processing,” in Proceedings of the IEEE International Conference on Computer Vision, 2021. [Online]. Available: http://arxiv.org/abs/2012.09790v2
  30. K. Pulli, H. Hoppe, M. Cohen, L. Shapiro, T. Duchamp, and W. Stuetzle, “View-based rendering: Visualizing real objects from scanned range and color data,” in Rendering Techniques’ 97: Proceedings of the Eurographics Workshop in St. Etienne, France, June 16–18, 1997 8.   Springer, 1997, pp. 23–34.
  31. K. C. Zheng, A. Colburn, A. Agarwala, M. Agrawala, D. Salesin, B. Curless, and M. F. Cohen, “Parallax photography: creating 3d cinematic effects from stills,” in Proceedings of Graphics Interface 2009, 2009, pp. 111–118.
  32. C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM transactions on graphics (TOG), vol. 23, no. 3, pp. 600–608, 2004.
  33. E. Penner and L. Zhang, “Soft 3D reconstruction for view synthesis,” ACM Transactions on Graphics (TOG), vol. 36, no. 6, pp. 1–11, 2017.
  34. P. Hedman, J. Philip, T. Price, J.-M. Frahm, G. Drettakis, and G. Brostow, “Deep blending for free-viewpoint image-based rendering,” ACM Transactions on Graphics (TOG), vol. 37, no. 6, pp. 1–15, 2018.
  35. G. Riegler and V. Koltun, “Free view synthesis,” in European Conference on Computer Vision, 2020, pp. 623–640.
  36. R. A. Drebin, L. Carpenter, and P. Hanrahan, “Volume rendering,” ACM Siggraph Computer Graphics, vol. 22, no. 4, pp. 65–74, 1988.
  37. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision.   Springer, 2016, pp. 694–711.
  38. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  39. G. Riegler and V. Koltun, “Stable view synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  40. M. Suhail, C. Esteves, L. Sigal, and A. Makadia, “Light field neural rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8269–8279.
  41. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  42. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  43. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
  44. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  45. Z. Su, T. Zhou, K. Li, D. Brady, and Y. Liu, “View synthesis from multi-view rgb data using multilayered representation and volumetric estimation,” Virtual Reality & Intelligent Hardware, vol. 2, no. 1, pp. 43–55, 2020.
  46. T. Zhou, J. Huang, T. Yu, R. Shao, and K. Li, “Hdhuman: High-quality human novel-view rendering from sparse views,” IEEE Transactions on Visualization and Computer Graphics, 2023.

Summary

  • The paper introduces an explicit 3D visibility reasoning framework that significantly improves occlusion handling in both dynamic and static scenes.
  • It integrates geometry and texture volumes to reconstruct scene structure and aggregate multi-view features, reducing reliance on computationally intensive MLP queries.
  • The method achieves real-time, high-resolution rendering (up to 1920×1080) with competitive quality on benchmarks, fostering immersive applications.

Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning

Introduction

The advancement of novel-view rendering has been a pivotal area of research in computer graphics, focusing on enabling immersive user experiences akin to real-world navigation. Neural Radiance Fields (NeRF) have emerged as a leading approach, renowned for delivering photo-realistic results via MLP-based 3D scene representations. However, traditional NeRF methods require separate network training for each scene, posing challenges for dynamic scenes and high-resolution rendering with real-time constraints.

In "Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning" (2402.12886), the authors propose a novel framework to address these constraints, offering real-time rendering of high-resolution images from sparse views of both static and dynamic scenes. The key innovation lies in explicit 3D visibility reasoning, which significantly improves visibility estimation and rendering quality, particularly in occluded regions.

Methodology

The proposed method distinguishes itself through several core components:

  1. Explicit 3D Visibility Reasoning: The method efficiently estimates the visibility of sampled 3D points using an explicitly constructed volume. Unlike implicit methods relying on MLPs, this approach provides global consistency and fits seamlessly into the volume rendering pipeline.
  2. Volume and Feature Integration: The pipeline is structured around discretized geometry volumes and continuous texture volumes. Geometry volumes facilitate initial geometry reconstruction, which informs visibility reasoning. In contrast, texture volumes use these insights for enhanced multi-view feature aggregation and rendering.
  3. Ray Integration and Rendering: By integrating rays within the feature space and employing a 2D convolutional neural network (CNN) for final rendering, the approach bypasses computationally intensive MLP queries. This design choice markedly reduces rendering times while enhancing output quality. Figure 1

    Figure 1: Our method generally shows competitive rendering results with the baselines, with better results in occluded areas due to explicit 3D visibility reasoning.

Results

The experimental evaluation demonstrates that the method outperforms traditional approaches in rendering quality and speed, particularly for complex scenes with significant occlusions:

  • Static Scenes: On datasets such as DTU and Real Forward-facing, the method achieves competitive or superior performance with only brief fine-tuning, showcasing robustness across varied textures and illuminations.
  • Dynamic Scenes: The explicit visibility reasoning enables superior handling of dynamic scenes with severe occlusions, as evidenced by comparative results against methods like ENeRF and NeuRay. The rendering speed remains real-time, even at high resolutions up to 1920×1080. Figure 2

    Figure 2: Our method demonstrates high-quality rendering in dynamic scenes, especially on occluded edges and areas, outperforming methods using average pooling operations.

Implications and Future Directions

The research introduces significant advancements in rendering complex, dynamic scenes with high fidelity and reduced computational costs. Practical applications span VR environments, real-time visualization in media production, and advanced simulations requiring rapid scene adaptability.

Future research may explore further optimizations in volume integration techniques and explore adaptive learning strategies under varied input conditions. Extending support for even larger-scale dynamic scenes with more complex motion patterns remains an intriguing challenge.

Conclusion

The proposed method revolutionizes high-resolution view synthesis by integrating explicit 3D visibility reasoning into the rendering pipeline. This advancement not only enhances rendering quality but also significantly accelerates processing, paving the way for immersive real-time applications across diverse technical fields. By addressing occlusion challenges effectively, the framework sets new benchmarks for future innovations in computer graphics and visualization technologies.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We found no open problems mentioned in this paper.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 58 likes about this paper.