Text2Tex: Text-driven Texture Synthesis via Diffusion Models (2303.11396v1)

Published 20 Mar 2023 in cs.CV

Abstract: We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts. Our method incorporates inpainting into a pre-trained depth-aware image diffusion model to progressively synthesize high resolution partial textures from multiple viewpoints. To avoid accumulating inconsistent and stretched artifacts across views, we dynamically segment the rendered view into a generation mask, which represents the generation status of each visible texel. This partitioned view representation guides the depth-aware inpainting model to generate and update partial textures for the corresponding regions. Furthermore, we propose an automatic view sequence generation scheme to determine the next best view for updating the partial texture. Extensive experiments demonstrate that our method significantly outperforms the existing text-driven approaches and GAN-based methods.

Authors (5)

Dave Zhenyu Chen (12 papers)
Yawar Siddiqui (14 papers)
Hsin-Ying Lee (60 papers)
Sergey Tulyakov (108 papers)
Matthias Nießner (177 papers)

Citations (153)

View on Semantic Scholar

Summary

The paper presents a depth-aware diffusion approach to synthesize high-quality 3D mesh textures directly from textual descriptions.
It integrates inpainting and automatic view selection to dynamically refine textures and minimize artifacts across multiple viewpoints.
Experiments demonstrate significant improvements in FID and KID scores, highlighting enhanced realism and consistency over traditional methods.

Text2Tex: High-Quality Texture Generation for 3D Meshes Through Text Prompts

The paper introduces Text2Tex, a novel framework designed to generate high-quality textures for 3D meshes from textual descriptions. This work builds upon advancements in diffusion models, specifically leveraging a depth-aware text-to-image generation approach. The developed methodology addresses the intrinsic challenges of synthesizing 3D textures that are consistent with given object geometries and faithful to textual prompts, surpassing the limitations of prior text-driven and GAN-based methods.

Methodological Contributions

Text2Tex deploys a depth-aware image diffusion model to progressively synthesize partial textures from multiple viewpoints. This approach integrates inpainting mechanisms into the diffusion process, dynamically segmenting textures to manage visible texels across various views. The system incorporates a generation mask that aids in refining and updating textures, counteracting common issues such as inconsistent or stretched artifacts resulting from geometrical discrepancies.

To further refine the texture synthesis, the authors propose an automatic viewpoint sequence generation mechanism. This algorithm strategically selects the optimal viewing angles for refining partial textures, ensuring comprehensive coverage of the 3D surface and minimizing artifacts. As a result of these innovations, Text2Tex achieves significant performance improvements, as demonstrated through extensive experiments.

Experimental Results

The evaluations conducted demonstrate that Text2Tex offers superior performance relative to existing text-to-image based methods and traditional GAN-based approaches. On a subset of the Objaverse dataset, Text2Tex achieved a notable reduction in FID and KID scores when compared to baselines like Text2Mesh and Latent-Paint, suggesting an increase in image realism and consistency. For category-specific tasks, such as texturizing cars from the ShapeNet dataset, Text2Tex also surpassed specialized GAN methods by substantial margins, underscoring its generalization capabilities.

Theoretical and Practical Implications

Text2Tex extends the application of diffusion models from 2D image generation to the field of 3D texture synthesis, opening new pathways for generating 3D content in various industries, including gaming, film, and augmented reality. The methodology’s ability to produce high fidelity textures based on textual input has significant implications for automation in 3D content creation, potentially reducing the labor-intensive processes currently involved in 3D modeling and design.

The paper's exploration into automatic viewpoint selection also provides a foundation for future research in optimizing rendering sequences, which might be leveraged in other computational graphics applications. By refining the interplay between text input and 3D model characteristics, Text2Tex paves the way for more intuitive and user-friendly interfaces in 3D design tools.

Conclusion and Future Work

Text2Tex introduces a robust framework for synthesizing high-quality, text-driven 3D textures, setting a new benchmark in the field. Future explorations could focus on extending the model’s capabilities to handle dynamic textures or integrating more advanced semantic understanding from textual descriptions to further enhance the richness and applicability of generated textures. This work exemplifies the potential of diffusion models in 3D graphics and offers a promising direction for future advancements in AI-driven design and visualization tools.

Text2Tex: Text-driven Texture Synthesis via Diffusion Models (2303.11396v1)

Summary

Text2Tex: High-Quality Texture Generation for 3D Meshes Through Text Prompts

Methodological Contributions

Experimental Results

Theoretical and Practical Implications

Conclusion and Future Work

Tweets

YouTube

Text2Tex: Text-driven Texture Synthesis via Diffusion Models (2303.11396v1)

Summary

Text2Tex: High-Quality Texture Generation for 3D Meshes Through Text Prompts

Methodological Contributions

Experimental Results

Theoretical and Practical Implications

Conclusion and Future Work

Related Papers

Tweets

YouTube