SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition (2401.17857v3)

Published 31 Jan 2024 in cs.CV

Abstract: 3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS to improve segmentation accuracy while preserving segmentation speed. Specifically, we introduce a Gaussian Decomposition scheme, which ingeniously utilizes the special structure of 3D Gaussian, finds out, and then decomposes the boundary Gaussians. Moreover, to achieve fast interactive 3D segmentation, we introduce a novel training-free pipeline by lifting a 2D foundation model to 3D-GS. Extensive experiments demonstrate that our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.

Authors (8)

Xu Hu (10 papers)
Yuxi Wang (49 papers)
Lue Fan (26 papers)
Junsong Fan (14 papers)
Junran Peng (30 papers)
Zhen Lei (205 papers)
Qing Li (430 papers)
Zhaoxiang Zhang (162 papers)

Citations (3)

View on Semantic Scholar

Summary

The paper proposes SA-GS, a training-free method that segments 3D Gaussian representations using the 2D SAM model and cross-view label voting.
It introduces Gaussian Decomposition to refine object boundaries, addressing the roughness typical in 3D Gaussian splat segmentation.
The method enables efficient scene editing and collision detection in VR/AR by delivering high-quality segmentation in real time.

Introduction

3D scene understanding is instrumental for numerous applications across virtual reality (VR), augmented reality (AR), and media production. This scene comprehension includes the reconstruction of scenes and the perception of environments developed from imagery or video data. Traditional methods such as Neural Radiance Fields (NeRF) have seen considerable success. However, limitations due to the extensive training time and the impracticalities posed by large-scale scene representations have necessitated alternative approaches. One such emerging method is 3D Gaussian Splatting, which provides high-quality rendering at real-time speeds. It represents scenes using a corpus of colored 3D Gaussians that are well-suited for rendering into camera views. Despite its advantages, the domain lacked an effective technique for parsing these representations into segmented objects — a process imperative for editing and collision detection within the 3D environment.

Segmenting 3D Gaussians Without Training

The paper introduces an innovative process, termed SA-GS for Segment Anything in 3D Gaussians, that achieves object segmentation within the 3D Gaussian framework without relying on any training or learned parameters. By utilizing the 2D foundational model SAM and multi-view mask generation, the authors have devised a method to maintain consistent segmentation across different views. This is extended with a cross-view label-voting mechanism to assign consistent labels across various perspectives. Furthermore, they tackle the boundary roughness issue, which is a result of non-negligible spatial sizes of 3D Gaussians at object boundaries. A simple but impactful approach called Gaussian Decomposition is incorporated to refine segmented object boundaries.

Experimental Results and Applications

The methodological propositions have been tested across a substantial assortment of 3D scenes. The experiments demonstrate convincingly that SA-GS attains high-quality 3D segmentation results. An impactful aspect of SA-GS is its ease of application for scene editing and collision detection tasks, as the segmented mask simplifies further modifications. The paper promises to release the codes, which will surely aid in accelerating future research and application development.

Conclusion

In summary, SA-GS is a significant stride forward in the field of 3D scene understanding, providing an interactive, training-free approach to accurately parsing 3D Gaussian splat representations. Importantly, by addressing the boundary roughness of segmented objects and enabling effective scene editing and detection tasks, the method stands out as a flexible and robust solution for real-world applications. It paves the way for efficient real-time applications in various industries, revolutionizing how we interact with virtual environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1752923854282887539

https://twitter.com/fly51fly/status/1753180305601810861