Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text (2311.17917v1)

Published 29 Nov 2023 in cs.GR and cs.CV

Abstract: We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions. Existing text-to-avatar methods are either limited to static avatars which cannot be animated or struggle to generate animatable avatars with promising quality and precise pose control. To address these limitations, we propose AvatarStudio, a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars. Specifically, AvatarStudio begins with a low-resolution NeRF-based representation for coarse generation, followed by incorporating SMPL-guided articulation into the explicit mesh representation to support avatar animation and high resolution rendering. To ensure view consistency and pose controllability of the resulting avatars, we introduce a 2D diffusion model conditioned on DensePose for Score Distillation Sampling supervision. By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars from text that are ready for animation, significantly outperforming previous methods. Moreover, it is competent for many applications, e.g., multimodal avatar animations and style-guided avatar creation. For more results, please refer to our project page: http://jeff95.me/projects/avatarstudio.html

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Metahuman. https://www.unrealengine.com/en-US/metahuman, 2023.
  2. Torchmetrics. https://torchmetrics.readthedocs.io/en/stable/multimodal/clip_score.html, 2023.
  3. imghum: Implicit generative models of 3d human shape and articulated pose. In ICCV, 2021.
  4. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. ArXiv, 2023.
  5. Implicit fairing of irregular meshes using diffusion and curvature flow. In Seminal Graphics Papers: Pushing the Boundaries. 2023.
  6. Torchmetrics-measuring reproducibility in pytorch. Journal of Open Source Software, 2022.
  7. Densepose: Dense human pose estimation in the wild. In CVPR, 2018.
  8. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  9. Eva3d: Compositional 3d human generation from 2d image collections. ArXiv, 2022a.
  10. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. ACM Trans. on Graphics, 2022b.
  11. Dreamwaltz: Make a scene with complex 3d animatable avatars. ArXiv, 2023.
  12. Zero-shot text-guided object generation with dream fields. CVPR, 2021.
  13. Avatarcraft: Transforming text into neural human avatars with parameterized shape and pose control. ArXiv, 2023.
  14. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. on Graphics, 2023.
  15. Clip-mesh: Generating textured meshes from text using pretrained image-text models. ACM Transactions on Graphics, 2022.
  16. Adam: A method for stochastic optimization. In ICLR, 2015.
  17. Pare: Part attention regressor for 3d human body estimation. In ICCV, 2021.
  18. Dreamhuman: Animatable 3d avatars from text. ArXiv, 2023.
  19. Modular primitives for high-performance differentiable rendering. ACM Trans. on Graphics, 2020.
  20. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In CVPR, 2021.
  21. Tada! text to animatable digital avatars. ArXiv, 2023.
  22. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023a.
  23. Common diffusion noise schedules and sample steps are flawed. ArXiv, 2023b.
  24. Smpl: A skinned multi-person linear model. ACM Trans. on Graphics, 2015.
  25. Marching cubes: A high resolution 3d surface construction algorithm. 1998.
  26. Latent-nerf for shape-guided generation of 3d shapes and textures. ArXiv, 2022.
  27. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  28. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. on Graphics, 2022.
  29. Laplacian mesh optimization. In CGIT, 2006.
  30. Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
  31. Dreamfusion: Text-to-3d using 2d diffusion. In ICCV, 2022.
  32. Learning transferable visual models from natural language supervision. In ICML, 2021.
  33. Texture: Text-guided texturing of 3d shapes. ArXiv, 2023.
  34. High-resolution image synthesis with latent diffusion models, 2021.
  35. Photorealistic text-to-image diffusion models with deep language understanding. ArXiv, 2022.
  36. Clip-forge: Towards zero-shot text-to-shape generation. CVPR, 2021.
  37. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
  38. Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. In NeurIPS, 2022.
  39. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In NeurIPS, 2021.
  40. Mvdream: Multi-view diffusion for 3d generation. ArXiv, 2023.
  41. Human motion diffusion model. In ICLR, 2023.
  42. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. ArXiv, 2023.
  43. Xagen: 3d expressive human avatars generation. In NeurIPS, 2023.
  44. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. ArXiv, 2023.
  45. Geometry-consistent neural shape representation with implicit displacement fields. ArXiv, 2021.
  46. Avatarverse: High-quality & stable 3d avatar creation from text and pose. ArXiv, 2023a.
  47. Avatargen: a 3d generative model for animatable human avatars. ArXiv, 2022.
  48. Adding conditional control to text-to-image diffusion models. ICCV, 2023b.
  49. Getavatar: Generative textured meshes for animatable human avatars. In ICCV, 2023c.
Citations (12)

Summary

We haven't generated a summary for this paper yet.