Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation (2404.01843v2)

Published 2 Apr 2024 in cs.CV

Abstract: Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond. arXiv preprint arXiv:2304.04968 (2023).
  2. MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing. arXiv preprint arXiv:2304.08465 (2023).
  3. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).
  4. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 22246–22256.
  5. Control3d: Towards controllable text-to-3d generation. In Proceedings of the 31st ACM International Conference on Multimedia. 1148–1156.
  6. SketchSampler: Sketch-Based 3D Reconstruction via View-Dependent Depth Sampling. In European Conference on Computer Vision. Springer, 464–479.
  7. Get3d: A generative model of high quality 3d textured shapes learned from images. Advances In Neural Information Processing Systems 35 (2022), 31841–31854.
  8. Sketch2mesh: Reconstructing and editing 3d shapes from sketches. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13023–13032.
  9. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371 (2023).
  10. Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior. arXiv preprint arXiv:2312.11535 (2023).
  11. DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation. arXiv preprint arXiv:2306.12422 (2023).
  12. Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).
  13. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (2023).
  14. A Diffusion-ReFinement Model for Sketch-to-Point Modeling. In Proceedings of the Asian Conference on Computer Vision. 1522–1538.
  15. Controllable text-to-image generation. Advances in Neural Information Processing Systems 32 (2019).
  16. Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era. arXiv preprint arXiv:2305.06131 (2023).
  17. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22511–22521.
  18. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 300–309.
  19. SketchFaceNeRF: Sketch-based facial generation and editing in neural radiance fields. ACM Transactions on Graphics (2023).
  20. Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior. arXiv preprint arXiv:2312.06655 (2023).
  21. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. arXiv preprint arXiv:2311.07885 (2023).
  22. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928 (2023).
  23. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9298–9309.
  24. ATT3D: Amortized Text-to-3D Object Synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 17946–17956.
  25. 3d shape reconstruction from sketches via multi-view convolutional networks. In 2017 International Conference on 3D Vision (3DV). IEEE, 67–77.
  26. Sked: Sketch-guided text-based 3d editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14607–14619.
  27. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  28. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023).
  29. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022).
  30. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning. PMLR, 16784–16804.
  31. Autodecoding latent 3d diffusion models. arXiv preprint arXiv:2307.05445 (2023).
  32. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  33. PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds. arXiv preprint arXiv:2303.09695 (2023).
  34. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843 (2023).
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  36. Dreambooth3d: Subject-driven text-to-3d generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2349–2359.
  37. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
  38. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  39. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  40. Sketch-a-shape: Zero-shot sketch-to-3d shape generation. arXiv preprint arXiv:2307.03869 (2023).
  41. Zero-shot multi-modal artist-controlled retrieval and exploration of 3d object sets. In SIGGRAPH Asia 2022 Technical Communications. Association for Computing Machinery, 1–4.
  42. Text-to-4d dynamic scene generation. In Proceedings of the 40th International Conference on Machine Learning. PMLR, 31915–31929.
  43. Stability AI. 2023. Stable Zero123: Quality 3D Object Generation from Single Images. https://stability.ai/news/stable-zero123-3d-generation Online; accessed 13 December 2023.
  44. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653 (2023).
  45. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 22762–22772.
  46. TextMesh: Generation of Realistic 3D Meshes From Text Prompts. arXiv preprint arXiv:2304.12439 (2023).
  47. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–11.
  48. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12619–12629.
  49. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. arXiv preprint arXiv:2305.16213 (2023).
  50. Sketch and text guided diffusion model for colored point cloud generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8929–8939.
  51. Haifeng Xia and Zhengming Ding. 2020. Structure preserving generative cross-domain learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4364–4373.
  52. Maximum structural generation discrepancy for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3434–3445.
  53. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721 (2023).
  54. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529 (2023).
  55. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4578–4587.
  56. Points-to-3d: Bridging the gap between sparse points and shape-controllable text-to-3d generation. In Proceedings of the 31st ACM International Conference on Multimedia. 6841–6850.
  57. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Trans. Graph. (2023).
  58. Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting. arXiv preprint arXiv:2312.13271 (2023).
  59. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
  60. Sketch2model: View-aware 3d modeling from single free-hand sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6012–6021.
  61. Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models. arXiv preprint arXiv:2305.16322 (2023).
  62. Locally attentional sdf diffusion for controllable 3d shape generation. arXiv preprint arXiv:2305.04461 (2023).
  63. Locally attentional sdf diffusion for controllable 3d shape generation. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–13.
  64. Joseph Zhu and Peiye Zhuang. 2023. HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance. arXiv preprint arXiv:2305.18766 (2023).
  65. Dreameditor: Text-driven 3d scene editing with neural fields. In SIGGRAPH Asia 2023 Conference Papers. 1–10.
Citations (1)

Summary

  • The paper introduces a novel framework that converts sketches into 3D models using style-consistent guidance, enhancing shape fidelity.
  • It employs advanced deep learning techniques to generate coherent 3D structures from freehand inputs, paving the way for interactive design tools.
  • Experimental results show significant improvements in visual consistency and accuracy compared to existing sketch-to-3D methods.

Insightful Overview on ACM's Article Formatting Guidelines

Introduction

The ACM has developed a single, comprehensive template to ensure consistency and readability across its publications. This detailed document provides an extensive overview of the ACM's consolidated article template, introduced in 2017. The template serves multiple functions from formatting to facilitating metadata extraction and accessibility - crucial for the future integration into the ACM Digital Library. The flexibility embedded in the design allows authors to prepare documents for various stages of publication, from submissions for review to camera-ready copies, across both conference proceedings and journal publications.

Templating Nuances

Template Styles and Parameters

The article outlines the differential template styles (acmsmall, acmlarge, acmtog, acmconf, sigchi, sigchi-a, sigplan) designed to accommodate the diverse requirements of ACM's publications, including journals and conference proceedings. Each style is chosen based on the nature of the publication and the specific SIG governing the work. Furthermore, it discusses template parameters like anonymous, review, authorversion, and screen, which adjust the template style to suit various publication stages and requirements, such as dual-anonymous conference submissions or generating screen-friendly versions.

Prohibited Modifications

A significant emphasis is placed on the strict prohibition against modifying the template. This includes altering fundamental elements such as margins, typeface sizes, and the usage of commands to manage vertical spacing. These restrictions are enforced to maintain the integrity and uniformity of ACM publications.

Typeface Requirements

The document stresses the mandatory use of the "Libertine" typeface family, barring substitutions to maintain a standard visual aesthetic across publications. The directive serves to unify the appearance of ACM works, contributing to a cohesive brand identity.

Title, Authors, and Affiliation Guidelines

Authors are advised on how to appropriately format titles, manage author information, and specify affiliations to ensure clarity and accuracy in the metadata. Precise instructions for handling long titles, multiple authors sharing affiliations, and the necessity of including e-mail addresses are provided to optimize the metadata extraction process.

Rights Information and CCS Concepts

The necessity of including rights management information and the use of the ACM Computing Classification System (CCS) for taxonomic classification of the work is discussed. These components are vital for the legal and academic categorization and discoverability of the articles within the ACM ecosystem and beyond.

Formatting and Content Structure

The document extends detailed guidance on structuring the content, including adherence to standard LaTeX sectioning commands and the preparation of tables, math equations, and figures. Particular attention is given to the formatting and placement of tables and figures to enhance readability and accessibility. The imperative of providing accurate figure descriptions is highlighted to facilitate content comprehension for visually impaired readers and improve search engine optimization.

Citations, Acknowledgments, and Appendices

Clear instructions are given on the preparation of bibliographies using BibTeX, ensuring completeness and accuracy in citations. Guidelines for acknowledging contributions and support are also provided, demarcating a specific acks environment for this section. Lastly, the document delineates how to incorporate appendices effectively, ensuring they are correctly sectioned and integrated into the article.

Implications and Future Directions

The establishment of uniform formatting guidelines by the ACM plays a crucial role in the standardization of academic publications in the computing field. By enforcing a consistent structure and visual presentation, these guidelines not only enhance the readability and accessibility of research but also streamline the publication process. Looking ahead, as AI and automated tools become increasingly prevalent in research and publication workflows, the importance of standardized templates and metadata becomes even more pronounced. Future developments in this area may include more sophisticated templates that further ease the publication process while maintaining high standards of accessibility and interactivity. The continuous evolution of these guidelines will likely parallel advances in publishing technologies, with a sustained focus on improving the accessibility, discoverability, and usability of scholarly communications.

In summary, the ACM's consolidation effort in article templating showcases a forward-thinking approach to academic publishing - one that respects the traditions of scholarly communication while embracing the technological advancements that shape its future.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com