Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-view 3D Reconstruction with Transformer (2103.12957v1)

Published 24 Mar 2021 in cs.CV

Abstract: Deep CNN-based methods have so far achieved the state of the art results in multi-view 3D object reconstruction. Despite the considerable progress, the two core modules of these methods - multi-view feature extraction and fusion, are usually investigated separately, and the object relations in different views are rarely explored. In this paper, inspired by the recent great success in self-attention-based Transformer models, we reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem and propose a new framework named 3D Volume Transformer (VolT) for such a task. Unlike previous CNN-based methods using a separate design, we unify the feature extraction and view fusion in a single Transformer network. A natural advantage of our design lies in the exploration of view-to-view relationships using self-attention among multiple unordered inputs. On ShapeNet - a large-scale 3D reconstruction benchmark dataset, our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters ($70\%$ less) than other CNN-based methods. Experimental results also suggest the strong scaling capability of our method. Our code will be made publicly available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Dan Wang (154 papers)
  2. Xinrui Cui (4 papers)
  3. Xun Chen (166 papers)
  4. Zhengxia Zou (52 papers)
  5. Tianyang Shi (14 papers)
  6. Septimiu Salcudean (14 papers)
  7. Z. Jane Wang (54 papers)
  8. Rabab Ward (18 papers)
Citations (75)

Summary

We haven't generated a summary for this paper yet.