3D-PreMise: Can Large Language Models Generate 3D Shapes with Sharp Features and Parametric Control? (2401.06437v1)
Abstract: Recent advancements in implicit 3D representations and generative models have markedly propelled the field of 3D object generation forward. However, it remains a significant challenge to accurately model geometries with defined sharp features under parametric controls, which is crucial in fields like industrial design and manufacturing. To bridge this gap, we introduce a framework that employs LLMs to generate text-driven 3D shapes, manipulating 3D software via program synthesis. We present 3D-PreMise, a dataset specifically tailored for 3D parametric modeling of industrial shapes, designed to explore state-of-the-art LLMs within our proposed pipeline. Our work reveals effective generation strategies and delves into the self-correction capabilities of LLMs using a visual interface. Our work highlights both the potential and limitations of LLMs in 3D parametric modeling for industrial applications.
- Nerf: Representing scenes as neural radiance fields for view synthesis. European Conference on Computer Vision, 2020. doi: 10.1007/978-3-030-58452-8˙24.
- 3d gaussian splatting for real-time radiance field rendering. arXiv preprint arXiv: 2308.04079, 2023.
- Denoising diffusion probabilistic models. arXiv preprint arXiv: 2006.11239, 2020.
- 3d-gpt: Procedural 3d modeling with large language models. arXiv preprint arXiv: 2310.12945, 2023.
- Language agent tree search unifies reasoning acting and planning in language models. arXiv preprint arXiv: 2310.04406, 2023.
- Clip-mesh: Generating textured meshes from text using pretrained image-text models. ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia, 2022. doi: 10.1145/3550469.3555392.
- Dreamfusion: Text-to-3d using 2d diffusion. International Conference on Learning Representations, 2022. doi: 10.48550/arXiv.2209.14988.
- Magic3d: High-resolution text-to-3d content creation. Computer Vision and Pattern Recognition, 2022. doi: 10.1109/CVPR52729.2023.00037.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. ICCV, 2023.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv: 2305.16213, 2023.
- Dreambooth3d: Subject-driven text-to-3d generation. ICCV, 2023.
- Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv: 2308.16512, 2023.
- Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv: 2311.06214, 2023.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv: 2309.16653, 2023.
- Meshgpt: Generating triangle meshes with decoder-only transformers. arXiv preprint arXiv: 2311.15475, 2023.
- How can large language models help humans in design and manufacturing? arXiv preprint arXiv: 2307.14377, 2023.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv: 2303.08774, 2023.
- Code llama: Open foundation models for code. arXiv preprint arXiv: 2308.12950, 2023.
- Evaluating large language models trained on code. arXiv preprint arXiv: 2107.03374, 2021.
- Program synthesis with large language models. arXiv preprint arXiv: 2108.07732, 2021.
- Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation. arXiv preprint arXiv: 2308.01861, 2023.
- Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv: 2303.17568, 2023.
- Spoc: Search-based pseudocode to code. 2019.