Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Published 25 Nov 2019 in cs.CV | (1911.11130v2)

Abstract: We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (297)

View on Semantic Scholar

Summary

The paper presents a novel unsupervised autoencoder that decomposes images into depth, albedo, viewpoint, and illumination for 3D object reconstruction.
It leverages a probabilistic symmetry map to capture inherent, though imperfect, bilateral symmetry, enhancing the accuracy of 3D shape recovery.
Experiments show significant improvements in keypoint depth accuracy for human faces, cat faces, and cars, outperforming traditional supervised methods.

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

The study under consideration explores the novel task of learning three-dimensional (3D) models for deformable object categories from raw single-view images without any external supervision. The research employs an unsupervised learning framework that leverages the underlying symmetry in many natural object categories, introducing the concept of probable symmetry to improve the accuracy of 3D reconstructions.

In the absence of prior training information or annotations, the proposed method by Wu, Rupprecht, and Vedaldi functions as a robust autoencoder. This autoencoder decomposes each input image into intrinsic components: depth, albedo, viewpoint, and illumination. The critical innovation lies in modeling objects that are likely, but not certainly symmetric, by predicting a symmetry probability map which is integrated into the end-to-end training of the network.

Key to the success of this unsupervised approach is addressing the challenge of ill-posed decomposition without conventional supervision signals. The researchers leverage the inherent bilateral symmetry present in many object categories. An object processed in this system is assumed to conform to a symmetric prototype in the canonical view, thus enabling a form of virtual stereopsis through image mirroring, even when appearance asymmetries exist due to non-uniform lighting.

A significant aspect of the model is its dual focus on separating and subsequently exploiting illumination effects to enhance 3D shape recovery. Additionally, the end-to-end training regime incorporates uncertainty modeling through probabilistic symmetry maps, which enables the framework to manage irregularities in object symmetry such as facial hair or less regular textures.

Results from the researchers’ series of experiments affirm the method’s capability to derive detailed and accurate 3D shapes from various object categories, most notably human faces, cat faces, and cars. It achieves this without relying on predefined 3D models or 2D image labels, something unparalleled among methods that previously relied on such data. The paper reports notable improvements over existing supervised models in keypoint depth accuracy, particularly with human faces, even outperforming methods with access to annotated keypoints.

Contributions extend to theoretical implications concerning the minimal assumptions necessary for high-fidelity 3D reconstruction in an unsupervised context, suggesting potential extensions in representation (e.g., volumetric or mesh-based approaches) for handling even broader object classes.

This research opens promising avenues for future developments in computer vision by demonstrating that symmetry, albeit imperfect, paired with a nuanced understanding of lighting and probabilistic treatment of structure, yields remarkable outcomes in learned 3D representations. It invites further exploration into enriching models with diverse prior knowledge to transcend current limitations and to refine our understanding of inherently symmetrical aspects in diverse datasets.

Markdown Report Issue