Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 63 tok/s

Gemini 2.5 Pro 50 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 18 tok/s Pro

GPT-4o 102 tok/s Pro

Kimi K2 225 tok/s Pro

GPT OSS 120B 450 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

Learning the 3D Fauna of the Web (2401.02400v2)

Published 4 Jan 2024 in cs.CV

Abstract: Learning 3D models of all animals on the Earth requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data, which we overcome by simply learning from 2D Internet images. We show that prior category-specific attempts fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward fashion within seconds.

References (79)

Citations (18)

View on Semantic Scholar

Summary

The paper presents 3D-Fauna, a framework that leverages a Semantic Bank of Skinned Models to jointly model diverse quadruped species from 2D images.
The paper employs self-supervised feature extraction and Non-Rigid Structure-from-Motion to reconstruct articulated 3D meshes from single-view internet photos.
The paper demonstrates superior performance on a curated Fauna dataset, outperforming existing methods in both qualitative and quantitative evaluations across over 100 species.

Introduction

In the field of computer vision, the ability to reconstruct humans in 3D from images has advanced significantly, facilitating applications like virtual reality, gaming, and animation. This capability, however, has largely been confined to human subjects due to the specific complexities and data requirements involved. A new framework named 3D-Fauna proposes to change this by developing a comprehensive 3D animal model that can handle a broad range of quadruped species based solely on 2D images sourced from the internet.

Semantic Bank of Skinned Models

The development of 3D-Fauna revolves around the Semantic Bank of Skinned Models (SBSM), a novel technique that constructs a joint model for numerous animal species simultaneously. This is vital for capturing rarer animals that have fewer images available for training. The method utilizes a base shape bank and cognitive knowledge extracted via self-supervised feature extractors, essentially learning commonalities and differences in animal shapes to build a pan-category animal model. This model can deform and pose itself aptly to match any given image of a four-legged animal.

Unsupervised Learning Approach

3D-Fauna's training creatively overcomes the absence of multi-view or 3D data for most animals. It uses principles from Non-Rigid Structure-from-Motion and self-supervised features to reconstruct animals from single-view internet images. The technique is further refined with a mask discriminator which helps in generating realistic animal shapes from multiple viewpoints, thus thwarting biases introduced by the typical front-facing internet photos. The training process comprises three stages, focusing sequentially on registering base shapes, then articulation, and finally individual instance detail capture.

Fauna Dataset and Performance

The system was trained on the specially curated Fauna Dataset consisting of images from over 100 species of quadrupeds. After extensive training, 3D-Fauna demonstrated its ability to transform images into detailed, articulated 3D meshes. Comparative analyses revealed that this approach outperformed existing methods in both qualitative and quantitative terms, effectively creating 3D models for animals ranging from commonly photographed species to those barely represented in available data.

Conclusion

3D-Fauna marks a notable advancement in the field of computer vision and animal modeling. It confers the capability to deduce detailed 3D structures of a myriad of quadruped animals from a single image and portends broader applications where understanding and replicating the full diversity of animal shapes and movements is required. While currently limited to animals with a common skeletal plan, namely quadrupeds, and dependent on some level of image curation, 3D-Fauna sets a new bar for future endeavors towards modeling the natural world in three dimensions.