Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields (2401.05516v1)

Published 10 Jan 2024 in cs.CV, cs.AI, and cs.GR

Abstract: We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D neural radiance field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPRF supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPRF also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPRF achieves favorable photorealistic quality 3D scene stylization for large-scale scenes with diverse reference images. Project page: https://kim-geonu.github.io/FPRF/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Unpaired motion style transfer from video to animation. ACM Transactions on Graphics (TOG), 39(4): 64–1.
  2. Building Rome in a day. In IEEE International Conference on Computer Vision (ICCV).
  3. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In IEEE International Conference on Computer Vision (ICCV), 5855–5864.
  4. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5470–5479.
  5. Emerging properties in self-supervised vision transformers. In IEEE International Conference on Computer Vision (ICCV).
  6. Upst-nerf: Universal photorealistic style transfer of neural radiance fields for 3d scene. arXiv preprint arXiv:2208.07059.
  7. Stylizing 3d scene via implicit representation and hypernetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1475–1484.
  8. Photowct2: Compact autoencoder for photorealistic style transfer resulting from blockwise training and skip connections of high-frequency residuals. In IEEE Winter Conf. on Applications of Computer Vision (WACV).
  9. The Cityscapes Dataset for Semantic Urban Scene Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  10. Unified implicit neural stylization. In European Conference on Computer Vision, 636–654. Springer.
  11. K-Planes: Explicit Radiance Fields in Space, Time, and Appearance. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  12. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5501–5510.
  13. An automated method for large-scale, ground-based city model acquisition. International Journal of Computer Vision, 60: 5–24.
  14. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  15. Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12460–12469.
  16. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(6): 1397–1409.
  17. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. In IEEE International Conference on Computer Vision (ICCV).
  18. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), 172–189.
  19. Stylizednerf: consistent 3d scene stylization as stylized nerf via 2d-3d mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18342–18352.
  20. Hdr-plenoxels: Self-calibrating high dynamic range radiance fields. In European Conference on Computer Vision, 384–401. Springer.
  21. Lerf: Language embedded radiance fields. In IEEE International Conference on Computer Vision (ICCV).
  22. Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems (NeurIPS).
  23. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 740–755. Springer.
  24. StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8338–8348.
  25. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4): 1–14.
  26. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In European Conference on Computer Vision (ECCV).
  27. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4): 1–15.
  28. Snerf: stylized neural implicit representations for 3d scenes. arXiv preprint arXiv:2207.02363.
  29. Nichol, K. 2016. Painter by numbers, Wikiart, 2016. URL https://www. kaggle. com/c/painter-by-numbers/overview.
  30. Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision, 78: 143–167.
  31. 3dsnet: Unsupervised shape-to-shape 3d style transfer. arXiv preprint arXiv:2011.13388.
  32. Hand keypoint detection in single images using multiview bootstrapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  33. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  34. Photo tourism: Exploring photo collections in 3D. In SIGGRAPH Conference Proceedings.
  35. Block-NeRF: Scalable Large Scene Neural View Synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  36. Neural Feature Fusion Fields: 3D distillation of self-supervised 2D image representations. In International Conference on 3D Vision (3DV), 443–453. IEEE.
  37. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 12922–12931.
  38. Lipschitz regularity of deep neural networks: analysis and efficient estimation. Advances in Neural Information Processing Systems, 31.
  39. Neural Pose Transfer by Spatially Adaptive Instance Normalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  40. CCPL: contrastive coherence preserving loss for versatile style transfer. In European Conference on Computer Vision, 189–206. Springer.
  41. Arf: Artistic radiance fields. In European Conference on Computer Vision, 717–733. Springer.
  42. Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20712–20721.
  43. Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields. In International Conference on Learning Representations (ICLR).
  44. Very large-scale global sfm by distributed motion averaging. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4568–4577.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Kim Youwang (9 papers)
  2. Tae-Hyun Oh (75 papers)
  3. Geonu Kim (8 papers)
Citations (1)

Summary

  • The paper presents a feed-forward method that applies artistic styles to large-scale 3D scenes without requiring extensive per-style optimization.
  • It leverages Adaptive Instance Normalization (AdaIN) in a stylizable radiance field to ensure consistent, multi-view photorealistic style transfer.
  • It employs a novel multi-reference style dictionary to capture diverse style influences, enabling efficient and flexible scene visualization.

Overview of FPRF Methodology

The Feed-Forward Photorealistic Style Transfer (FPRF) method introduces an approach to apply artistic styles to large-scale 3D scenes, such as cityscapes, without the extensive optimization processes that formerly restricted this task to smaller scales. Unlike traditional methods, which generally undergo a complex, resource-intensive optimization for each new style or scene, FPRF's approach allows for a more efficient, single-stage training process that accepts various style references in a direct feed-forward manner, saving substantial computing time and effort.

Innovations in 3D Style Transfer

FPRF leverages the Adaptive Instance Normalization (AdaIN) technique, which has shown efficiency in previous style transfer applications, to operate on a style-decomposed 3D neural radiance field. By embedding AdaIN within the 3D neural representation, FPRF can perform style manipulation directly within the 3D space. This capacity is particularly powerful because it allows the preservation of multi-view consistency across different perspectives of the scene, a challenge that is not trivial for previous methods. Moreover, FPRF tackles the multi-reference style challenge, where it uses a novel style dictionary composed of local semantic codes and local style codes derived from multiple style references. This enables FPRF to capture the diversity of a large-scale scene more effectively than single-reference-based methods.

Technical Foundations

The underlying technology for FPRF focuses on two key innovations: a stylizable radiance field and a multi-referenced PST process. The stylizable radiance field consists of a scene content field and a scene semantic field, which together encapsulate geometric structure and content features. These features are then stylized through a photorealistic style transfer process that adapts to reference image styles in a feed-forward mechanism. The second innovation, driven by the need to represent the various objects across a wide 3D space, introduces a style dictionary mechanism, which, through semantic correspondence matching, enables multiple reference styles to influence the representation. This unique approach addresses the inherent complexity found in large-scale scenes, which is an obstacle for existing PST methods.

Results and Contributions

In its experiments, FPRF has demonstrated proficiency in large-scale 3D scene stylization with high-quality photorealistic results. Critically, it showcases this ability using diverse reference images while maintaining consistent style application across varying viewpoints. The model's versatility is also notable, setting it apart from other methods that lack support for multiple style references. Among its significant contributions, FPRF is the first multi-reference based 3D PST to scale to large scenes efficiently, without requiring the optimization steps typically associated with each new style adaptation.

FPRF’s advancements signify a promising direction for future virtual reality applications, realistic 3D scene visualizations, and augmented reality experiences where photorealistic style transfer can be applied dynamically and with a great deal of flexibility.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com