Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting (2312.13271v3)

Published 20 Dec 2023 in cs.CV

Abstract: Recent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch. Our project page is available at https://pku-yuangroup.github.io/repaint123/.

Citations (20)

View on Semantic Scholar

Summary

The paper introduces a hybrid two-stage framework that leverages Gaussian Splatting and 2D diffusion models for rapid and high-fidelity 3D generation.
It overcomes traditional SDS limitations by efficiently correcting view inconsistencies and enhancing textural accuracy through progressive repainting.
Experimental results using metrics like PSNR and LPIPS demonstrate significant improvements in speed and multi-view consistency for 3D rendering.

An Expert Analysis of Repaint123: Accelerating High-Quality Image-to-3D Conversion

Repaint123 introduces a novel framework aimed at mitigating the current limitations of image-to-3D generation techniques, with a focus on enhancing speed and quality. The method integrates the proficiency of progressive controllable 2D repainting with the precision of Gaussian Splatting and Mean Square Error (MSE) loss to produce high-quality 3D content from a single image within two minutes. This paper delivers a detailed critique of conventional approaches, highlighting issues of view inconsistency, suboptimal textural rendering, and inefficiency in generation time, before proposing an innovative solution.

The key technological innovation in Repaint123 is the hybrid application of 2D diffusion models for defect correction and texture enhancement. This method stands out by employing a dynamic two-stage rendering process. In the initial 'coarse' stage, Gaussian Splatting assists in forming a basic 3D model, prioritizing rapidity in constructing a foundational representation. This provides the groundwork for the subsequent 'refine' stage, where a diffusion model intervenes to repaint and correct inconsistencies, thereby ensuring texture fidelity and consistency across multiple viewpoints.

Notably, the research departs from the conventional reliance on Score Distillation Sampling (SDS) due to its inefficiencies. SDS optimization often results in excessive saturation and textural smoothness, with the refinement process being notably slow. Instead, Repaint123 leverages the repainting strategy to streamline the procedural stages of Multi-View Consistent Images Generation, avoiding the protracted processes traditionally inherent in SDS methods.

The paper presents comprehensive experimental data demonstrating the superiority of Repaint123 in terms of performance and efficiency. Quantitative evaluations, such as CLIP-Similarity, Contextual-Distance, PSNR, and LPIPS metrics, substantiate the claims of improved multi-view consistency and fine-grained textural accuracy. The paper demonstrates that Repaint123 generates 3D content that not only exceeds visual fidelity tests but does so with a significant reduction in time.

From a practical perspective, the implications of Repaint123's capabilities extend far beyond theoretical exploration, presenting industrial applications in augmented reality and virtual reality fields, where quick and reliable content generation is critical. Additionally, the methodological advancements in neural representation learning could influence ongoing research in computer graphics, especially concerning efficient and scalable texture synthesis methods.

The researchers also briefly interact with the latest advancements in alternating 3D representation methodologies, such as NeRF, and contrast them against their proposed Gaussian Splatting-based approach. The forward-thinking and adaptive repainting strategy could inspire future innovations in rendering technologies, optimizing both the computational overhead and qualitative output of image-based 3D synthesis techniques.

Repaint123 forms a bridge between 2D image refinement and comprehensive 3D modeling, positing a forward-looking path for achieving higher fidelity in rendered content while maintaining computational efficiency. While Gaussian Splatting presents current limitations in geometry fidelity, ongoing advancements in this domain are expected to further enhance its applicability in image-to-3D generation tasks.

Through rigorous experimentation, strategic methodology, and thoughtful engineering, this paper positions Repaint123 as a significant contributory technique in the domain of computational imaging and 3D content generation. The method's efficacy in producing quality results rapidly positions it as a promising candidate for mainstream adoption in applications requiring swift and accurate 3D object generation from minimal input data.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/janusch_patas/status/1743388165480538559