From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion (2204.03842v5)
Abstract: While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively interact and fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel pipeline called Deep Fusion MVR (DF-MVR) to explore the feature correspondences between multi-view images and reconstruct high-precision 3D faces. Specifically, we present a novel multi-view feature fusion backbone that utilizes face masks to align features from multiple encoders and integrates one multi-layer attention mechanism to enhance feature interaction and fusion, resulting in one unified facial representation. Additionally, we develop one concise face mask mechanism that facilitates multi-view feature fusion and facial reconstruction by identifying common areas and guiding the network's focus on critical facial features (e.g., eyes, brows, nose, and mouth). Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of our model. Without 3D annotation, DF-MVR achieves 5.2% and 3.0% RMSE improvement over the existing weakly supervised MVRs respectively on Pixel-Face and Bosphorus dataset. Code will be available publicly at https://github.com/weiguangzhao/DF_MVR.