Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute (2401.00711v1)

Published 1 Jan 2024 in cs.CV and cs.AI

Abstract: Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. Text2Avatar leverages a discrete codebook as an intermediate feature to establish a connection between text and avatars, enabling the disentanglement of features. Furthermore, to alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model to obtain a large amount of 3D avatar pseudo data, which allows Text2Avatar to achieve realistic style generation. Experimental results demonstrate that our method can generate realistic 3D avatars from coupled textual data, which is challenging for other existing methods in this field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Avatarclip: Zero-shot text-driven generation and animation of 3d avatars,” arXiv preprint arXiv:2205.08535, 2022.
  2. “Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models,” arXiv preprint arXiv:2304.00916, 2023.
  3. “Clip-nerf: Text-and-image driven manipulation of neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3835–3844.
  4. “Clip-mesh: Generating textured meshes from text using pretrained image-text models,” in SIGGRAPH Asia 2022 conference papers, 2022, pp. 1–8.
  5. “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
  6. “Tech: Text-guided reconstruction of lifelike clothed humans,” arXiv preprint arXiv:2308.08545, 2023.
  7. “Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows,” ACM Transactions on Graphics (ToG), vol. 40, no. 3, pp. 1–21, 2021.
  8. “Interpreting the latent space of gans for semantic face editing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9243–9252.
  9. “Consistent123: One image to highly consistent 3d asset using case-aware diffusion priors,” arXiv preprint arXiv:2309.17261, 2023.
  10. “Pix2nerf: Unsupervised conditional p-gan for single image to neural radiance fields translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3981–3990.
  11. “EVA3d: Compositional 3d human generation from 2d image collections,” in International Conference on Learning Representations, 2023.
  12. “Text2human: Text-driven controllable human image generation,” ACM Transactions on Graphics (TOG), vol. 41, no. 4, pp. 1–11, 2022.
  13. “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
  14. “Self-correction for human parsing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3260–3271, 2020.
  15. “Finedance: A fine-grained choreography dataset for 3d full body dance generation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 10234–10243.
  16. “Follow your pose: Pose-guided text-to-video generation using pose-free videos,” arXiv preprint arXiv:2304.01186, 2023.
  17. “Deepfashion: Powering robust clothes recognition and retrieval with rich annotations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  18. “Smpl: A skinned multi-person linear model,” ACM transactions on graphics (TOG), vol. 34, no. 6, pp. 1–16, 2015.
  19. “Dreamfusion: Text-to-3d using 2d diffusion,” arXiv preprint arXiv:2209.14988, 2022.
  20. “Let 2d diffusion model know 3d-consistency for robust text-to-3d generation,” arXiv preprint arXiv:2303.07937, 2023.
Citations (4)

Summary

We haven't generated a summary for this paper yet.