DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors (2406.01476v3)

Published 3 Jun 2024 in cs.CV

Abstract: Dynamic 3D interaction has been attracting a lot of attention recently. However, creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, which requires manually assigning precise physical properties to the object or the simulated results would become unnatural. Another solution is to learn the deformation of 3D objects with the distillation of video generative models, which, however, tends to produce 3D videos with small and discontinuous motions due to the inappropriate extraction and application of physics priors. In this work, to combine the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions. In particular, we propose motion distillation sampling to emphasize video motion information during distillation. In addition, to facilitate the optimization, we further propose a KAN-based material field with frame boosting. Experimental results demonstrate that our method enjoys more realistic motions than state-of-the-arts do.

Citations (8)

View on Semantic Scholar

Summary

The paper presents a novel framework integrating MPM simulations with video diffusion priors to refine dynamic 3D Gaussian scenes.
It employs score distillation sampling with truncated back-propagation through time to optimize physical parameters and ensure stable convergence.
Experimental results show competitive performance against PhysDreamer while highlighting challenges in simulating diverse 4D motion interactions.

DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

Introduction

"DreamPhysics" introduces a novel framework for learning physical properties of dynamic 3D Gaussians with video diffusion priors. Amidst rapidly expanding demand for realistic dynamic 3D interactions in applications such as VR and video gaming, this work aims to bridge the gap between static 3D assets and dynamic simulation by integrating physics-based simulation with video diffusion models. Unlike prior approaches that rely on manual parameter assignments and suffer from unrealistic simulations, DreamPhysics optimizes physical parameters through distillation of video generative models, yielding more realistic motions in 4D content.

Figure 1: (a): the setting of physical properties will significantly affect the quality of simulated videos; (b) current video diffusion models can hardly control to generate desired results.

Method Overview

DreamPhysics combines a material point method (MPM) simulation with video diffusion model guidance to refine physical parameters of a static 3D Gaussian Splatting scene. Initiating with a series of inferred parameters, the framework renders a 4D video through MPM simulation; inaccuracies due to initial parameter settings are iteratively refined using score distillation sampling (SDS) from previously rendered videos. The iterative refinement continues until physical parameters converge to produce realistic 4D scenes.

Figure 2: Overview of DreamPhysics. We first initialize a set of physical parameters for a static 3D GS, which is then fed through a series of optimizations to refine physical parameters based on simulated results.

Parameter Optimization

The core of DreamPhysics revolves around differentiable MPM simulations and 3D GS rendering. The backward propagation of gradients through these differentiable components enables optimization of the physical parameters. The gradients calculated using SDS are flowed backward over the simulation parameters and into the physical properties for refinement across epochs.

Additionally, the implementation of truncated back-propagation through time (BPTT) paired with frame interpolation mitigates gradient vanishing/exploding dilemmas, providing a stable update path for each training epoch.

Frame Interpolation and Log Gradient

Frame interpolation strategies segment video frames into multiple groups providing a window for iterative optimization across diverse motion scenarios. Logarithmic gradient updates are leveraged to equalize granularity in gradient updates across widely varying physical parameter magnitudes, ensuring accurate convergence for properties such as Young’s modulus.

Experiments

DreamPhysics is experimented upon under text-conditioned and image-conditioned optimizations, showcasing its effectiveness in moderating physical parameters for realistic simulations.

Figure 3: Text-conditioned optimization. (a): if Young's modulus is set too low, the ficus will excessively tilt; (b): if set too high, oscillation becomes vague.

Figure 4: Image-conditioned optimization. (a): Excessive deformation with low modulus; (b): With high modulus, deformation is insufficient.

Comparison and Discussion

In comparative evaluations against PhysDreamer, DreamPhysics displays competitive performance by distilling video priors more effectively. Unlike PhysDreamer which principally relies on using ground-truth video generated by SVD, DreamPhysics enriches optimization through the integrated process of video prior distillation.

Figure 5: Comparison with our concurrent work PhysDreamer.

Nonetheless, challenges remain, such as expanding simulated motion diversity beyond collision events and swaying, and formulating physics-based metrics to better evaluate simulation quality. The current simulators’ limitations in handling extensive scene interactions also pose an area ripe for enhancement.

Conclusion

DreamPhysics exemplifies a methodological advance in optimizing and learning physical dynamics within 3D simulations. By integrating video diffusion priors with physical property optimization, it achieves more realistic movement generation. This research paves the way for future exploration in dynamic 3D scene simulation, offering promising directions for expanding the variety and complexity of simulated interactions in virtual environments.