Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 72 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 43 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 219 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Rotation Matters: Generalized Monocular 3D Object Detection for Various Camera Systems (2310.05366v1)

Published 9 Oct 2023 in cs.CV

Abstract: Research on monocular 3D object detection is being actively studied, and as a result, performance has been steadily improving. However, 3D object detection performance is significantly reduced when applied to a camera system different from the system used to capture the training datasets. For example, a 3D detector trained on datasets from a passenger car mostly fails to regress accurate 3D bounding boxes for a camera mounted on a bus. In this paper, we conduct extensive experiments to analyze the factors that cause performance degradation. We find that changing the camera pose, especially camera orientation, relative to the road plane caused performance degradation. In addition, we propose a generalized 3D object detection method that can be universally applied to various camera systems. We newly design a compensation module that corrects the estimated 3D bounding box location and heading direction. The proposed module can be applied to most of the recent 3D object detection networks. It increases AP3D score (KITTI moderate, IoU $> 70\%$) about 6-to-10-times above the baselines without additional training. Both quantitative and qualitative results show the effectiveness of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Height and uprightness invariance for 3d prediction from a single view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 491–500, 2020.
  2. Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In CVPR, 2017.
  3. Monocular 3d object detection for autonomous driving. In CVPR, 2016.
  4. Rangedet: In defense of range view for lidar-based 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2918–2927, 2021.
  5. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
  6. Homography loss for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1080–1089, 2022.
  7. Multiple view geometry in computer vision. Cambridge university press, 2003.
  8. Exploring intermediate representation for monocular vehicle pose estimation. In CVPR, 2021.
  9. Geometry-aware data augmentation for monocular 3d object detection. arXiv preprint arXiv:2104.05858, 2021.
  10. Smoke: Single-stage monocular 3d object detection via keypoint estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 996–997, 2020.
  11. Delving into localization errors for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4721–4730, 2021.
  12. Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In CVPR, 2019.
  13. Automotive radar dataset for deep learning based 3d object detection. In 2019 16th european radar conference (EuRAD), pages 129–132. IEEE, 2019.
  14. Deep learning based 3d object detection for automotive radar and camera. In 2019 16th European Radar Conference (EuRAD), pages 133–136. IEEE, 2019.
  15. Centerfusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1527–1536, 2021.
  16. Clocs: Camera-lidar object candidates fusion for 3d object detection. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10386–10393. IEEE, 2020.
  17. Monogrnet: A geometric reasoning network for monocular 3d object localization. In AAAI, 2019.
  18. Ground plane polling for 6dof pose estimation of objects on the road. IEEE Transactions on Intelligent Vehicles, 5(3):449–460, 2020.
  19. Rsn: Range sparse net for efficient, accurate lidar 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5725–5734, 2021.
  20. Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14362–14372, 2021.
  21. Multi-level fusion based 3d object detection from monocular images. In CVPR, 2018.
  22. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4471–4480, 2019.
  23. Objects are different: Flexible monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3289–3298, 2021.
  24. Camera pose matters: Improving depth prediction by mitigating pose distribution bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15759–15768, 2021.
  25. Monocular 3d object detection: An extrinsic parameter free approach. In CVPR, 2021.
Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube