Emergent Mind

Abstract

Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary LLMs to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.

Overview

  • The paper introduces Story-to-Motion, a system for synthesizing character animations from detailed text narratives with both infinite scope and close textual alignment.

  • Story-to-Motion fuses kinematics and semantic action interpretation to produce lifelike motion sequences from textual descriptions.

  • The proposed system consists of three modules: a text-driven motion scheduler, a text-based motion retrieval, and a neural motion blending system.

  • The system outperforms existing methods in trajectory adherence, temporal action composition, and motion blending, demonstrating its effectiveness in generating extended sequences.

  • This work could revolutionize filmmaking and game development by enabling more natural and infinite character animations directly from narrative texts.

Introduction

The art of crafting virtual worlds and characters that move in harmony with compelling narratives is a challenge that spans across the animation, gaming, and film industries. At the forefront of this field is the novel task known as Story-to-Motion, which strives to synthesize character animations that are both infinite in scope and closely aligned with textual descriptions.

Overview of Story-to-Motion

The Story-to-Motion process begins by taking a detailed textual narrative—a "story"—and transforming it into a meticulously constructed sequence of character motions. What sets Story-to-Motion apart is its holistic approach: it attends to both the exacting details of kinematics (specifically, the trajectories characters follow) and the broader semantic meaning of actions described in a text. Traditional approaches have fallen short in this regard, focusing either on closely following trajectories without considering the text or generating brief, semantic motions while disregarding longer, trajectory-informed animations. The new system introduced aims to transcend these limitations.

Methodology

The system proposed in the paper comprises three interconnected modules:

  1. Text-driven Motion Scheduler: Utilizing a Large Language Model, this module parses an input story, distilling it into a list of character actions, locations, and temporal spans. With some knowledge of the 3D scene, the locations mentioned can be translated into continuous trajectories via a path-finding algorithm.
  2. Text-based Motion Retrieval: In this step, a motion database is accessed via an auto-regressive retrieval function to find matching clips that align with the text and provide a realistic portrayal of the motion. The retrieval strategy includes kinematic and semantic features to ensure the fidelity of both motion and narrative.
  3. Neural Motion Blending: This final module is where the magic happens. Here, the selected motion clips are woven into a seamless and natural motion sequence. To overcome common issues in blending, such as jarring transitions or mismatches in motion style, a progressive mask transformer has been developed, ensuring smoother transitions.

Experimental Results and Contributions

Evaluations reveal that the new system outperforms existing state-of-the-art methods in key sub-tasks: trajectory following, temporal action composition, and motion blending. Not only does it improve the quality of motion over short distances, but it also excels in the synthesis of extended sequences, a testament to its versatile applicability.

The contributions of this work are threefold, offering a comprehensive solution for infinite, lifelike motion generation tied to textual narratives, a text-driven, controllable system for long human animations, and empirical evidence of its superior performance across standard benchmarks.

Final Thoughts

Story-to-Motion represents a significant leap towards more natural and infinite character animation in response to narrative text. With the promise of enhancing the animation pipeline and offering new creative tools to filmmakers and game developers, it opens the door to an era where dynamic character animations are no longer bound by the limits of pre-scripted motion paths, but flow as freely as the stories they are born from.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.