Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video (2401.08742v3)
Abstract: Generating dynamic 3D object from a single-view video is challenging due to the lack of 4D labeled data. An intuitive approach is to extend previous image-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling.However, this approach would be slow and expensive to scale due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly reconstruct the 4D content through a 4D Gaussian splatting model. Importantly, our method can achieve real-time rendering under continuous camera trajectories. To enable robust reconstruction under sparse views, we introduce inconsistency-aware confidence-weighted loss design, along with a lightly weighted score distillation loss. Extensive experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the quality of novel view synthesis. For example, Efficient4D takes only 10 minutes to model a dynamic object, vs 120 minutes by the previous art model Consistent4D.
- Maximo. https://www.mixamo.com/, 2023.
- Skectchfab. https://sketchfab.com/, 2023.
- Hexplane: A fast representation for dynamic scenes. In ICCV, 2023.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. In ICCV, 2023.
- Objaverse-xl: A universe of 10m+ 3d objects. arXiv preprint, 2023a.
- Objaverse: A universe of annotated 3d objects. In CVPR, 2023b.
- K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
- Dreamtime: An improved optimization strategy for text-to-3d content creation. arXiv preprint, 2023.
- Real-time intermediate flow estimation for video frame interpolation. In ECCV, 2022.
- Consistent4d: Consistent 360° dynamic object generation from monocular video. arxiv, 2023.
- Shap-e: Generating conditional 3d implicit functions. arXiv preprint, 2023.
- 3d gaussian splatting for real-time radiance field rendering. In ACM TOG, 2023.
- Segment anything. In ICCV, 2023.
- Neural 3d video synthesis from multi-view video. In CVPR, 2022.
- Focaldreamer: Text-driven 3d editing via focal-fusion assembly. arXiv preprint, 2023a.
- Dynibar: Neural dynamic image-based rendering. In CVPR, 2023b.
- Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
- Devrf: Fast deformable voxel radiance fields for dynamic scenes. NeurIPS, 2022.
- One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint, 2023a.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023b.
- Syncdreamer: Learning to generate multiview-consistent images from a single-view image. arXiv preprint, 2023c.
- Wonder3d: Single image to 3d using cross-domain diffusion. arXiv preprint, 2023.
- Realfusion: 360deg reconstruction of any object from a single image. In CVPR, 2023.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 2021.
- Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint, 2022.
- Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. In ACM TOG, 2021.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023.
- D-nerf: Neural radiance fields for dynamic scenes. In CVPR, 2021.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint, 2023.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Let 2d diffusion model know 3d-consistency for robust text-to-3d generation. arXiv preprint, 2023.
- Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In CVPR, 2023.
- Mvdream: Multi-view diffusion for 3d generation. arXiv preprint, 2023.
- Make-a-video: Text-to-video generation without text-video data. arXiv preprint, 2022.
- Text-to-4d dynamic scene generation. In ICML, 2023.
- Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint, 2023.
- Textmesh: Generation of realistic 3d meshes from text prompts. In 3D vision, 2024.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 2004.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In NeurIPS, 2023.
- Hd-fusion: Detailed text-to-3d generation leveraging multiple noise estimation. arXiv preprint, 2023.
- Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In CVPR, 2023.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint, 2023.
- Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint, 2023.
- High-quality video view interpolation using a layered representation. ACM TOG, 2004.
- Ewa volume splatting. In Visualization, Vis, 2001.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.