Emergent Mind

FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields

(2401.05516)
Published Jan 10, 2024 in cs.CV , cs.AI , and cs.GR

Abstract

We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D neural radiance field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPRF supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPRF also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPRF achieves favorable photorealistic quality 3D scene stylization for large-scale scenes with diverse reference images. Project page: https://kim-geonu.github.io/FPRF/

Overview

  • Introduces FPRF, an efficient technique for applying styles to large 3D scenes using a single-stage training process.

  • Utilizes Adaptive Instance Normalization (AdaIN) within a 3D neural radiance field for direct style manipulation and multi-view consistency.

  • Introduces a novel style dictionary for multi-reference style application, enhancing scene diversity representation.

  • Demonstrates the ability to maintain consistent style application across different viewpoints with high-quality results.

  • Offers potential for advancements in virtual reality, 3D visualization, and augmented reality through scalable, dynamic style transfer.

Overview of FPRF Methodology

The Feed-Forward Photorealistic Style Transfer (FPRF) method introduces an approach to apply artistic styles to large-scale 3D scenes, such as cityscapes, without the extensive optimization processes that formerly restricted this task to smaller scales. Unlike traditional methods, which generally undergo a complex, resource-intensive optimization for each new style or scene, FPRF's approach allows for a more efficient, single-stage training process that accepts various style references in a direct feed-forward manner, saving substantial computing time and effort.

Innovations in 3D Style Transfer

FPRF leverages the Adaptive Instance Normalization (AdaIN) technique, which has shown efficiency in previous style transfer applications, to operate on a style-decomposed 3D neural radiance field. By embedding AdaIN within the 3D neural representation, FPRF can perform style manipulation directly within the 3D space. This capacity is particularly powerful because it allows the preservation of multi-view consistency across different perspectives of the scene, a challenge that is not trivial for previous methods. Moreover, FPRF tackles the multi-reference style challenge, where it uses a novel style dictionary composed of local semantic codes and local style codes derived from multiple style references. This enables FPRF to capture the diversity of a large-scale scene more effectively than single-reference-based methods.

Technical Foundations

The underlying technology for FPRF focuses on two key innovations: a stylizable radiance field and a multi-referenced PST process. The stylizable radiance field consists of a scene content field and a scene semantic field, which together encapsulate geometric structure and content features. These features are then stylized through a photorealistic style transfer process that adapts to reference image styles in a feed-forward mechanism. The second innovation, driven by the need to represent the various objects across a wide 3D space, introduces a style dictionary mechanism, which, through semantic correspondence matching, enables multiple reference styles to influence the representation. This unique approach addresses the inherent complexity found in large-scale scenes, which is an obstacle for existing PST methods.

Results and Contributions

In its experiments, FPRF has demonstrated proficiency in large-scale 3D scene stylization with high-quality photorealistic results. Critically, it showcases this ability using diverse reference images while maintaining consistent style application across varying viewpoints. The model's versatility is also notable, setting it apart from other methods that lack support for multiple style references. Among its significant contributions, FPRF is the first multi-reference based 3D PST to scale to large scenes efficiently, without requiring the optimization steps typically associated with each new style adaptation.

FPRF’s advancements signify a promising direction for future virtual reality applications, realistic 3D scene visualizations, and augmented reality experiences where photorealistic style transfer can be applied dynamically and with a great deal of flexibility.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.