Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 164 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Multi-task 3D building understanding with multi-modal pretraining (2306.10146v1)

Published 16 Jun 2023 in cs.CV and eess.IV

Abstract: This paper explores various learning strategies for 3D building type classification and part segmentation on the BuildingNet dataset. ULIP with PointNeXt and PointNeXt segmentation are extended for the classification and segmentation task on BuildingNet dataset. The best multi-task PointNeXt-s model with multi-modal pretraining achieves 59.36 overall accuracy for 3D building type classification, and 31.68 PartIoU for 3D building part segmentation on validation split. The final PointNeXt XL model achieves 31.33 PartIoU and 22.78 ShapeIoU on test split for BuildingNet-Points segmentation, which significantly improved over PointNet++ model reported from BuildingNet paper, and it won the 1st place in the BuildingNet challenge at CVPR23 StruCo3D workshop.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2016.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  3. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  4. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3075–3084, 2019.
  5. Synthetic 3d data generation pipeline for geometric deep learning in architecture. arXiv preprint arXiv:2104.12564, 2021.
  6. Benjamin Graham and Laurens Van der Maaten. Submanifold sparse convolutional networks. arXiv preprint arXiv:1706.01307, 2017.
  7. Open-vocabulary object detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921, 2021.
  8. Scenenet: understanding real world indoor scenes with synthetic data. arxiv preprint (2015). arXiv preprint arXiv:1511.07041, 3, 2015.
  9. What makes imagenet good for transfer learning? arXiv preprint arXiv:1608.08614, 2016.
  10. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  11. Rethinking network design and local geometry in point cloud: A simple residual mlp framework. arXiv preprint arXiv:2202.07123, 2022.
  12. Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 909–918, 2019.
  13. Slip: Self-supervision meets language-image pre-training. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI, pages 529–544. Springer, 2022.
  14. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  15. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in Neural Information Processing Systems, 35:23192–23204, 2022.
  16. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  17. Buildingnet: Learning to label 3d buildings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10397–10407, 2021.
  18. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020.
  19. Vitruvio: 3d building meshes via single perspective sketches. arXiv preprint arXiv:2210.13634, 2022.
  20. Ulip: Learning unified representation of language, image and point cloud for 3d understanding. arXiv preprint arXiv:2212.05171, 2022.
  21. Ulip-2: Towards scalable multimodal pre-training for 3d understanding. arXiv preprint arXiv:2305.08275, 2023.
  22. Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

Youtube Logo Streamline Icon: https://streamlinehq.com