Emergent Mind

Learning the 3D Fauna of the Web

(2401.02400)
Published Jan 4, 2024 in cs.CV

Abstract

Learning 3D models of all animals on the Earth requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottleneck of modeling animals is the limited availability of training data, which we overcome by simply learning from 2D Internet images. We show that prior category-specific attempts fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward fashion within seconds.

Overview

  • Introduction of 3D-Fauna, a framework to create 3D animal models from 2D images

  • Semantic Bank of Skinned Models helps model multiple animal species jointly

  • Utilizes unsupervised learning to reconstruct animals from single-view images

  • Trained on Fauna Dataset featuring over 100 species, producing detailed 3D meshes

  • Exceeds performance of previous methods, potential for broader applications in various industries

Introduction

In the realm of computer vision, the ability to reconstruct humans in 3D from images has advanced significantly, facilitating applications like virtual reality, gaming, and animation. This capability, however, has largely been confined to human subjects due to the specific complexities and data requirements involved. A new framework named 3D-Fauna proposes to change this by developing a comprehensive 3D animal model that can handle a broad range of quadruped species based solely on 2D images sourced from the internet.

Semantic Bank of Skinned Models

The development of 3D-Fauna revolves around the Semantic Bank of Skinned Models (SBSM), a novel technique that constructs a joint model for numerous animal species simultaneously. This is vital for capturing rarer animals that have fewer images available for training. The method utilizes a base shape bank and cognitive knowledge extracted via self-supervised feature extractors, essentially learning commonalities and differences in animal shapes to build a pan-category animal model. This model can deform and pose itself aptly to match any given image of a four-legged animal.

Unsupervised Learning Approach

3D-Fauna's training creatively overcomes the absence of multi-view or 3D data for most animals. It uses principles from Non-Rigid Structure-from-Motion and self-supervised features to reconstruct animals from single-view internet images. The technique is further refined with a mask discriminator which helps in generating realistic animal shapes from multiple viewpoints, thus thwarting biases introduced by the typical front-facing internet photos. The training process comprises three stages, focusing sequentially on registering base shapes, then articulation, and finally individual instance detail capture.

Fauna Dataset and Performance

The system was trained on the specially curated Fauna Dataset consisting of images from over 100 species of quadrupeds. After extensive training, 3D-Fauna demonstrated its ability to transform images into detailed, articulated 3D meshes. Comparative analyses revealed that this approach outperformed existing methods in both qualitative and quantitative terms, effectively creating 3D models for animals ranging from commonly photographed species to those barely represented in available data.

Conclusion

3D-Fauna marks a notable advancement in the field of computer vision and animal modeling. It confers the capability to deduce detailed 3D structures of a myriad of quadruped animals from a single image and portends broader applications where understanding and replicating the full diversity of animal shapes and movements is required. While currently limited to animals with a common skeletal plan, namely quadrupeds, and dependent on some level of image curation, 3D-Fauna sets a new bar for future endeavors towards modeling the natural world in three dimensions.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube