Emergent Mind

EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation

(2312.02256)
Published Dec 4, 2023 in cs.CV , cs.AI , and cs.GR

Abstract

We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a latent space can be a non-trivial effort. On the other hand, accelerating generation by naively increasing the sampling step size, e.g., DDIM, often leads to quality degradation as it fails to approximate the complex denoising distribution. To address these issues, we propose EMDM, which captures the complex distribution during multiple sampling steps in the diffusion model, allowing for much fewer sampling steps and significant acceleration in generation. This is achieved by a conditional denoising diffusion GAN to capture multimodal data distributions among arbitrary (and potentially larger) step sizes conditioned on control signals, enabling fewer-step motion sampling with high fidelity and diversity. To minimize undesired motion artifacts, geometric losses are imposed during network learning. As a result, EMDM achieves real-time motion generation and significantly improves the efficiency of motion diffusion models compared to existing methods while achieving high-quality motion generation. Our code will be publicly available upon publication.

Overview

  • Efficient Motion Diffusion Model (EMDM) significantly speeds up the generation of high-quality human motion with fewer sampling steps.

  • EMDM uses a Conditional Denoising Diffusion Generative Adversarial Network to capture the complexities of human motion for efficient synthesis.

  • The model incorporates geometric loss functions to enhance the visual quality and stability of generated motions.

  • EMDM demonstrates superior motion quality and significantly reduced generation times compared to state-of-the-art models.

  • Future enhancements of EMDM include adding physical constraints to improve plausibility and exploring new modalities such as visual or musical cues.

Efficient Motion Generation with EMDM: A Faster Way to Animate Human Motion

One of the largest hurdles for animators and AI researchers is to create human-like motion quickly and effectively. Generating high-quality human motion has been an ongoing challenge in the field of artificial intelligence. The traditional diffusion models, known for their high-quality results, often fall short when speed is a necessity, as with applications such as real-time animation synthesis. Addressing this issue, a novel approach called Efficient Motion Diffusion Model (EMDM) has been introduced, tackling the need for speed without compromising on motion quality.

EMDM: Quick and Quality-Driven

The core advancement with EMDM lies in its ability to significantly reduce the number of sampling steps needed in the motion generation process. It employs a method known as Conditional Denoising Diffusion Generative Adversarial Network. This special network understands and can learn the complex distributions of human motion based on set conditions like text descriptions or action labels, and is able to accurately synthesize motion more efficiently.

Boosting Efficiency with Conditional GANs

A Conditional Denoising Diffusion Generative Adversarial Network, as part of EMDM, uses both a generator and a discriminator. The generator is tasked with creating motion, while the discriminator evaluates the motion’s authenticity. Both components take into consideration the time steps of motion generation and the given input signals, efficiently capturing the required human dynamics. This approach optimizes the denoising process, which enables far fewer steps with larger sampling sizes without a loss in motion quality.

Enhancing Motion Quality with Geometric Loss

In motion generation, subtleties matter greatly. To refine generated motions and reduce artifacts, EMDM integrates geometric loss functions during training. The geometric losses guide the training with more detailed constraints, focusing on aspects such as position, velocity, and foot contacts, thereby enhancing both the visual quality and stability of the motion.

Remarkable Speed with No Quality Compromise

The result of integrating these methods is a motion generation system that is remarkably faster than its contemporaries. EMDM achieves competitive or superior motion quality in both fidelity and diversity of motion compared to state-of-the-art models. For instance, in tasks such as action-to-motion and text-to-motion, the system only requires a fraction of the time previously needed, with average running times per sequence dropping significantly.

Future Directions

Despite the robustness of EMDM, there are still opportunities for further enhancements. Some generated motions may experience issues like floating or ground penetration due to the lack of physical constraints in the generation process. Future work aims at incorporating these physical aspects to ensure physically plausible animations. Additionally, EMDM's potential has yet to be fully explored across diverse modalities, such as visual inputs or musical cues, proposing exciting avenues for research.

EMDM’s innovation offers a glimpse into the future of motion synthesis, where quality animation can be produced rapidly and efficiently. As the technology continues to evolve, it holds promise for various real-world applications, paving the way for advancements in gaming, virtual reality, and beyond. The code for EMDM is to be made publicly available, allowing researchers and developers to continue building and enhancing this powerful tool for efficient motion generation.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.