Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreaMo: Articulated 3D Reconstruction From A Single Casual Video (2312.02617v2)

Published 5 Dec 2023 in cs.CV and cs.GR

Abstract: Articulated 3D reconstruction has valuable applications in various domains, yet it remains costly and demands intensive work from domain experts. Recent advancements in template-free learning methods show promising results with monocular videos. Nevertheless, these approaches necessitate a comprehensive coverage of all viewpoints of the subject in the input video, thus limiting their applicability to casually captured videos from online sources. In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete. We propose DreaMo that jointly performs shape reconstruction while solving the challenging low-coverage regions with view-conditioned diffusion prior and several tailored regularizations. In addition, we introduce a skeleton generation strategy to create human-interpretable skeletons from the learned neural bones and skinning weights. We conduct our study on a self-collected internet video collection characterized by incomplete view coverage. DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation. Extensive qualitative and quantitative studies validate the efficacy of each proposed component, and show existing methods are unable to solve correct geometry due to the incomplete view coverage.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Hi-lassie. https://github.com/google/hi-lassie, 2023a.
  2. Lab4d. https://github.com/lab4d-org/lab4d, 2023b.
  3. 3d bird reconstruction: a dataset, model, and shape recovery from a single view. In ECCV, 2020.
  4. Who left the dogs out? 3d animal reconstruction with expectation maximization in the loop. In ECCV, 2020.
  5. Demystifying mmd gans. In ICLR, 2018.
  6. Recovering non-rigid 3d shape from image streams. In CVPR, 2000.
  7. Objaverse: A universe of annotated 3d objects. In CVPR, 2023.
  8. Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
  9. Shape and viewpoint without keypoints. In ECCV, 2020.
  10. Non-rigid structure from motion with complementary rank-3 spaces. In CVPR, 2011.
  11. Skinning: Real-time shape deformation. In ACM SIGGRAPH 2014 Courses, 2014.
  12. Learning high fidelity depths of dressed humans by watching social media dance videos. In CVPR, 2021.
  13. Farm3d: Learning articulated 3d animals by distilling 2d diffusion. In 3DV, 2024.
  14. Neuman: Neural human radiance field from a single video. In ECCV, 2022.
  15. Learning category-specific mesh reconstruction from image collections. In ECCV, 2018.
  16. Vibe: Video inference for human body pose and shape estimation. In CVPR, 2020.
  17. To the point: Correspondence-driven monocular 3d category reconstruction. NeurIPS, 2021.
  18. Deep non-rigid structure from motion. In ICCV, 2019.
  19. Articulation-aware canonical surface mapping. In CVPR, 2020.
  20. Suryansh Kumar. Non-rigid structure from motion: Prior-free factorization method revisited. In WACV, 2020.
  21. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In ICCV, 2017.
  22. 360-degree textures of people in clothing from a single image. In 3DV, 2019.
  23. Online adaptation for consistent mesh reconstruction in the wild. NeurIPS, 2020a.
  24. Self-supervised single-view 3d reconstruction via semantic consistency. In ECCV, 2020b.
  25. Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
  26. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
  27. Neural actor: Neural free-view synthesis of human actors with pose control. ACM TOG, 2021.
  28. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023a.
  29. Robust dynamic radiance fields. In CVPR, 2023b.
  30. Smpl: A skinned multi-person linear model. ACM TOG, 2015.
  31. Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
  32. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
  33. High-fidelity performance metrics for generative models in pytorch, 2020.
  34. Nerfies: Deformable neural radiance fields. In ICCV, 2021.
  35. Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
  36. Animatable neural radiance fields for modeling dynamic human bodies. In ICCV, 2021.
  37. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
  38. D-nerf: Neural radiance fields for dynamic scenes. In CVPR, 2021.
  39. Learning transferable visual models from natural language supervision. In ICML, 2021.
  40. Dreambooth3d: Subject-driven text-to-3d generation. In ICCV, 2023.
  41. Zero-shot text-to-image generation. In ICML, 2021.
  42. Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2302.01721, 2023.
  43. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  44. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
  45. Text-to-4d dynamic scene generation. In ICML, 2023.
  46. A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. NeurIPS, 2021.
  47. Npc: Neural point characters from video. In ICCV, 2023.
  48. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023.
  49. Vid2actor: Free-viewpoint animatable person synthesis from video in the wild. arXiv preprint arXiv:2012.12884, 2020.
  50. Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR, 2022.
  51. Dove: Learning deformable 3d objects by watching videos. IJCV, 2023a.
  52. Magicpony: Learning articulated 3d animals in the wild. In CVPR, 2023b.
  53. Monocular total capture: Posing face, body, and hands in the wild. In CVPR, 2019.
  54. Learning to segment rigid motions from two frames. In CVPR, 2021.
  55. Lasr: Learning articulated shape reconstruction from a monocular video. In CVPR, 2021a.
  56. Viser: Video-specific surface embeddings for articulated 3d shape reconstruction. NeurIPS, 2021b.
  57. Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022.
  58. Lassie: Learning articulated shapes from sparse image ensemble via 3d part discovery. NeurIPS, 2022.
  59. Hi-lassie: High-fidelity articulated shape and skeleton discovery from sparse image ensemble. In CVPR, 2023a.
  60. Artic3d: Learning robust articulated 3d shapes from noisy web image collections. arXiv preprint arXiv:2306.04619, 2023b.
  61. Volume rendering of neural implicit surfaces. NeurIPS, 2021.
  62. Shelf-supervised mesh prediction in the wild. In CVPR, 2021.
  63. Editable free-viewpoint video using a layered neural representation. ACM TOG, 2021.
  64. Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
  65. 3d menagerie: Modeling the 3d shape and pose of animals. In CVPR, 2017.
  66. Lions and tigers and bears: Capturing non-rigid, 3d, articulated shape from images. In CVPR, 2018.
  67. Three-d safari: Learning to estimate zebra pose, shape, and texture from images in the wild. In ICCV, 2019.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com