Emergent Mind

Abstract

Reconstructing deformable tissues from endoscopic videos is essential in many downstream surgical applications. However, existing methods suffer from slow rendering speed, greatly limiting their practical use. In this paper, we introduce EndoGaussian, a real-time endoscopic scene reconstruction framework built on 3D Gaussian Splatting (3DGS). By integrating the efficient Gaussian representation and highly-optimized rendering engine, our framework significantly boosts the rendering speed to a real-time level. To adapt 3DGS for endoscopic scenes, we propose two strategies, Holistic Gaussian Initialization (HGI) and Spatio-temporal Gaussian Tracking (SGT), to handle the non-trivial Gaussian initialization and tissue deformation problems, respectively. In HGI, we leverage recent depth estimation models to predict depth maps of input binocular/monocular image sequences, based on which pixels are re-projected and combined for holistic initialization. In SPT, we propose to model surface dynamics using a deformation field, which is composed of an efficient encoding voxel and a lightweight deformation decoder, allowing for Gaussian tracking with minor training and rendering burden. Experiments on public datasets demonstrate our efficacy against prior SOTAs in many aspects, including better rendering speed (195 FPS real-time, 100$\times$ gain), better rendering quality (37.848 PSNR), and less training overhead (within 2 min/scene), showing significant promise for intraoperative surgery applications. Code is available at: \url{https://yifliu3.github.io/EndoGaussian/}.

EndoGaussian framework with Holistic Gaussian Initialization, Spatio-temporal Gaussian Tracking, and Optimization phases depicted.

Overview

  • The paper introduces EndoGaussian, a framework for real-time 3D reconstruction of deformable tissue in surgical scenes using stereo endoscopic videos.

  • EndoGaussian utilizes Gaussian Splatting and a deformation field to model tissue deformations, outpacing previous methods based on Neural Radiance Fields.

  • The framework features two core innovations: Voxel-based Gaussian Tracking and Holistic Gaussian Initialization, enabling efficient and accurate dynamic scene tracking.

  • Performance evaluation shows EndoGaussian achieves up to 195 FPS rendering speeds and maintains high-quality reconstructions with PSNR values of 35.925.

  • The method has potential applications in enhancing intraoperative surgical tools, AR/VR training platforms, and could influence future robotic-assisted surgery protocols.

Introduction

In the field of robotic-assisted minimally invasive surgery (RAMIS), one particular technology of interest is the reconstruction of 3D models of tissues from stereo endoscopic videos. Such reconstructions are highly valuable, allowing surgeons to better plan and execute procedures when a clear and comprehensive view of the surgical scene is available. The development of efficient and accurate methods for dynamic surgical scene reconstruction thus has far-reaching implications for augmented reality (AR) and virtual reality (VR) training platforms, as well as for enhancing intraoperative decision-making and potentially automating aspects of robotic surgery.

EndoGaussian Framework

The paper posits the novel EndoGaussian framework, which employs Gaussian Splatting to achieve high-quality, real-time reconstructions of dynamic surgical scenes. It represents deformable tissues as 3D Gaussians and utilizes a deformation field to predict transformations over time, circumventing limitations of previous methods that relied on Neural Radiance Fields (NeRFs) but suffered from prohibitively slow rendering speeds. The EndoGaussian framework is distinguished by its ability to efficiently track Gaussian deformations through a combination of a lightweight encoding voxel and a minimal Multilayer Perceptron (MLP), delivering both rapid rendering and minor computational load during this process.

Methodological Innovations

The architecture of EndoGaussian introduces two critical components: the Voxel-based Gaussian Tracking and the Holistic Gaussian Initialization. The framework begins by modeling the surgical scene as anisotropic Gaussians with associated attributes, which are then tracked over time to reflect tissue deformation. The Voxel-based Gaussian Tracking is a sophisticated module designed to encode dynamics within the scene. Instead of a computationally onerous setup, it incorporates a decomposed encoding voxel and a tiny MLP, which makes real-time tracking feasible. The Holistic Gaussian Initialization is a strategy to amass a robust initialization set of Gaussians, circumventing the limitations imposed by conventional dense Gaussian initialization strategies, such as those dependent on Structure-from-Motion (SfM) techniques.

Performance Evaluation

Benchmarking on public datasets, EndoGaussian dramatically outperforms the speed of previous real-time rendering attempts without compromising on quality. With rendering speeds of up to 195 FPS and PSNR values of 35.925, the method not only provides a 100-fold increase in rendering efficiency but also ensures the fidelity of the reconstruction is upheld, as demonstrated over multiple surgical scene datasets. These findings suggest that the method holds great potential for enhancing intraoperative surgical tools and could fundamentally alter the real-time capabilities of surgical reconstruction.

Concluding Remarks

EndoGaussian ushers in a new paradigm for the reconstruction of deformable surgical scenes by addressing the pressing need for speed without forsaking quality. By ingeniously redefining the Gaussian tracking process and thoroughly capitalizing on an initialization method tailored for dynamic scenes, the framework carves a pathway towards integrating real-time 3D reconstructions in the operating room. The profound implications of this technology could resonate through various clinical tasks, potentially redefining surgical training and operational protocols, and setting a new benchmark for future studies in robotic-assisted surgery.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.