Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 44 tok/s

Gemini 2.5 Pro 41 tok/s Pro

GPT-5 Medium 13 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 86 tok/s Pro

Kimi K2 208 tok/s Pro

GPT OSS 120B 447 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text (2311.07446v1)

Published 13 Nov 2023 in cs.CV and cs.GR

Abstract: Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary LLMs to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.

Citations (12)

View on Semantic Scholar

Summary

The paper introduces a novel Story-to-Motion framework that converts detailed narratives into continuous, controllable character animations.
It employs a three-module approach combining text-driven scheduling, motion retrieval, and neural motion blending for seamless transitions.
Experimental results show improved trajectory following, temporal action composition, and motion blending compared to state-of-the-art methods.

Introduction

The art of crafting virtual worlds and characters that move in harmony with compelling narratives is a challenge that spans across the animation, gaming, and film industries. At the forefront of this field is the novel task known as Story-to-Motion, which strives to synthesize character animations that are both infinite in scope and closely aligned with textual descriptions.

Overview of Story-to-Motion

The Story-to-Motion process begins by taking a detailed textual narrative—a "story"—and transforming it into a meticulously constructed sequence of character motions. What sets Story-to-Motion apart is its holistic approach: it attends to both the exacting details of kinematics (specifically, the trajectories characters follow) and the broader semantic meaning of actions described in a text. Traditional approaches have fallen short in this regard, focusing either on closely following trajectories without considering the text or generating brief, semantic motions while disregarding longer, trajectory-informed animations. The new system introduced aims to transcend these limitations.

Methodology

The system proposed in the paper comprises three interconnected modules:

Text-driven Motion Scheduler: Utilizing a LLM, this module parses an input story, distilling it into a list of character actions, locations, and temporal spans. With some knowledge of the 3D scene, the locations mentioned can be translated into continuous trajectories via a path-finding algorithm.
Text-based Motion Retrieval: In this step, a motion database is accessed via an auto-regressive retrieval function to find matching clips that align with the text and provide a realistic portrayal of the motion. The retrieval strategy includes kinematic and semantic features to ensure the fidelity of both motion and narrative.
Neural Motion Blending: This final module is where the magic happens. Here, the selected motion clips are woven into a seamless and natural motion sequence. To overcome common issues in blending, such as jarring transitions or mismatches in motion style, a progressive mask transformer has been developed, ensuring smoother transitions.

Experimental Results and Contributions

Evaluations reveal that the new system outperforms existing state-of-the-art methods in key sub-tasks: trajectory following, temporal action composition, and motion blending. Not only does it improve the quality of motion over short distances, but it also excels in the synthesis of extended sequences, a testament to its versatile applicability.

The contributions of this work are threefold, offering a comprehensive solution for infinite, lifelike motion generation tied to textual narratives, a text-driven, controllable system for long human animations, and empirical evidence of its superior performance across standard benchmarks.

Final Thoughts

Story-to-Motion represents a significant leap towards more natural and infinite character animation in response to narrative text. With the promise of enhancing the animation pipeline and offering new creative tools to filmmakers and game developers, it opens the door to an era where dynamic character animations are no longer bound by the limits of pre-scripted motion paths, but flow as freely as the stories they are born from.