Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

Neural Rendering of Humans in Novel View and Pose from Monocular Video (2204.01218v2)

Published 4 Apr 2022 in cs.CV

Abstract: We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input. Despite the significant progress recently on this topic, with several methods exploring shared canonical neural radiance fields in dynamic scene scenarios, learning a user-controlled model for unseen poses remains a challenging task. To tackle this problem, we introduce an effective method to a) integrate observations across several frames and b) encode the appearance at each individual frame. We accomplish this by utilizing both the human pose that models the body shape as well as point clouds that partially cover the human as input. Our approach simultaneously learns a shared set of latent codes anchored to the human pose among several frames, and an appearance-dependent code anchored to incomplete point clouds generated by each frame and its predicted depth. The former human pose-based code models the shape of the performer whereas the latter point cloud-based code predicts fine-level details and reasons about missing structures at the unseen poses. To further recover non-visible regions in query frames, we employ a temporal transformer to integrate features of points in query frames and tracked body points from automatically-selected key frames. Experiments on various sequences of dynamic humans from different datasets including ZJU-MoCap show that our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.

Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.