Emergent Mind

Feature Splatting for Better Novel View Synthesis with Low Overlap

(2405.15518)

Published May 24, 2024 in cs.CV

Abstract

3D Gaussian Splatting has emerged as a very promising scene representation, achieving state-of-the-art quality in novel view synthesis significantly faster than competing alternatives. However, its use of spherical harmonics to represent scene colors limits the expressivity of 3D Gaussians and, as a consequence, the capability of the representation to generalize as we move away from the training views. In this paper, we propose to encode the color information of 3D Gaussians into per-Gaussian feature vectors, which we denote as Feature Splatting (FeatSplat). To synthesize a novel view, Gaussians are first "splatted" into the image plane, then the corresponding feature vectors are alpha-blended, and finally the blended vector is decoded by a small MLP to render the RGB pixel values. To further inform the model, we concatenate a camera embedding to the blended feature vector, to condition the decoding also on the viewpoint information. Our experiments show that these novel model for encoding the radiance considerably improves novel view synthesis for low overlap views that are distant from the training views. Finally, we also show the capacity and convenience of our feature vector representation, demonstrating its capability not only to generate RGB values for novel views, but also their per-pixel semantic labels. We will release the code upon acceptance. Keywords: Gaussian Splatting, Novel View Synthesis, Feature Splatting

Comparison of novel view synthesis between FeatSplat--16, FeatSplat--32, and 3DGS.

Overview

The paper introduces Feature Splatting (FeatSplat) to improve 3D scene representation by replacing spherical harmonics in 3D Gaussian Splatting (3DGS) with per-Gaussian feature vectors, enhancing color information encoding.
FeatSplat's methodology involves feature vector encoding, alpha blending, and multi-layer perceptron (MLP) decoding, and extends to semantic segmentation, demonstrating robustness across various datasets such as Mip-360 and ScanNet++.
Experiments show FeatSplat achieves better performance in terms of PSNR, SSIM, and memory usage, though with slightly slower rendering speeds compared to 3DGS, making it valuable for applications in novel view synthesis and semantic segmentation.

An Overview of Feature Splatting for Enhanced 3D Scene Representation

The paper introduces a novel approach to improving 3D scene representations, specifically targeting the limitations found in traditional methods such as Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS). The proposed method, termed Feature Splatting (FeatSplat), aims to address the key limitation of using spherical harmonics in 3DGS by adopting per-Gaussian feature vectors to encode color information. This essay explore the methodology, experiments, and implications of this approach, providing an insightful synthesis of the paper's contributions.

Introduction

The challenge of finding appropriate 3D scene representations is pivotal for applications in robotics, virtual reality (VR), and augmented reality (AR). Traditional scene representations like NeRFs are computationally intensive and scale poorly with scene size. The recent introduction of 3D Gaussian Splatting (3DGS) presents a faster alternative, but it relies on spherical harmonics to encode color, which limits the model's expressivity and generalization capabilities.

Methodology

Feature Splatting (FeatSplat) enhances 3DGS by replacing spherical harmonics with per-Gaussian feature vectors. This approach involves three key steps:

Feature Vector Encoding: Each 3D Gaussian is initialized with a feature vector sampled from a normal distribution.
Alpha Blending: During image synthesis, 3D Gaussians are projected into the image plane, and their corresponding feature vectors are alpha-blended.
Multi-Layer Perceptron (MLP) Decoding: The blended feature vector is concatenated with a camera embedding and decoded by a small MLP to render RGB pixel values.

The paper also extends FeatSplat to semantic segmentation, demonstrating the flexibility of feature vector representations in encoding both RGB values and per-pixel semantic labels.

Preliminaries: 3D Gaussian Splatting

3DGS uses a set of 3D Gaussians to encode scene geometry and color information through spherical harmonics. The rendering process involves converting SHs to RGB values, projecting Gaussians into 2D, sorting them, and finally alpha-blending their colors. This method achieves high-quality rendering at a significantly lower computational cost compared to NeRFs, though it incurs a higher memory usage and suffers from poor generalization for complex textures and distant viewpoints.

Experiments

The paper evaluates FeatSplat on several datasets, including Mip-360, Tanks and Temples (T&T), Deep Blending (DB), and ScanNet++. The evaluation metrics included SSIM, PSNR, LPIPS, rendering speed (FPS), and memory usage.

Results on Mip-360, T&T, and DB

FeatSplat achieved the best PSNR on all three datasets and showed improved SSIM on two of them. The qualitative results highlighted FeatSplat’s ability to render accurate and detailed images, though it slightly lagged in rendering speed compared to 3DGS. Notably, FeatSplat halved the memory usage compared to 3DGS.

Generalization to Novel Views

FeatSplat demonstrated superior performance in synthesizing novel views far from training distributions, substantially reducing the artifacts seen with 3DGS. This was evident in sequences where the camera moved through isolated or largely unexplored regions, showcasing FeatSplat’s enhanced ability to adapt Gaussians’ representation flexibly based on viewing conditions.

Results on ScanNet++

FeatSplat significantly outperformed both 3DGS and Compact-3DGS across all metrics. This dataset posed a greater challenge with its independent test set trajectories, underscoring FeatSplat’s robustness in lower visual overlap and distant viewpoints.

Semantic FeatSplat

The extension to per-pixel semantic segmentation achieved a weighted mIOU of 0.629 while maintaining high rendering performance (PSNR 24.64, SSIM 0.875, LPIPS 0.244) at 56 FPS. The resulting segmentation maps were coherent, albeit slightly noisy at the edges.

Limitations

While FeatSplat significantly enhances representation capacity and generalization, it introduces a trade-off between capacity and speed. The compact MLP limits texture complexity, sometimes resulting in over-smoothing. The increased feature vector dimension also leads to slower rendering speeds compared to 3DGS, though the authors argue it remains within acceptable real-time limits.

Implications and Future Directions

FeatSplat extends the utility of 3DGS by providing a more expressive and flexible scene representation. Its capacity to encode multiple colors within a single Gaussian and condition decoding on viewpoint information offers robust improvements in practical applications like novel view synthesis and semantic segmentation. Future research could explore optimizing the balance between rendering speed and texture complexity, as well as further extending the versatility of feature vector representations in other 3D vision tasks.

By addressing key limitations in existing 3D scene representations, Feature Splatting represents a meaningful advance in enhancing the performance and applicability of novel view synthesis techniques. The authors’ commitment to releasing the code upon acceptance promises further developments and community-driven insights into this promising approach.

Create an account to read this summary for free:

https://twitter.com/zhenjun_zhao/status/1794969286634791369