CascadeV-Det: Cascade Point Voting for 3D Object Detection (2401.07477v1)
Abstract: Anchor-free object detectors are highly efficient in performing point-based prediction without the need for extra post-processing of anchors. However, different from the 2D grids, the 3D points used in these detectors are often far from the ground truth center, making it challenging to accurately regress the bounding boxes. To address this issue, we propose a Cascade Voting (CascadeV) strategy that provides high-quality 3D object detection with point-based prediction. Specifically, CascadeV performs cascade detection using a novel Cascade Voting decoder that combines two new components: Instance Aware Voting (IA-Voting) and a Cascade Point Assignment (CPA) module. The IA-Voting module updates the object features of updated proposal points within the bounding box using conditional inverse distance weighting. This approach prevents features from being aggregated outside the instance and helps improve the accuracy of object detection. Additionally, since model training can suffer from a lack of proposal points with high centerness, we have developed the CPA module to narrow down the positive assignment threshold with cascade stages. This approach relaxes the dependence on proposal centerness in the early stages while ensuring an ample quantity of positives with high centerness in the later stages. Experiments show that FCAF3D with our CascadeV achieves state-of-the-art 3D object detection results with 70.4\% [email protected] and 51.6\% [email protected] on SUN RGB-D and competitive results on ScanNet. Code will be released at https://github.com/Sharpiless/CascadeV-Det
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 5828–5839, 2017.
- Anton Konushin Danila Rukhovich, Anna Vorontsova. Fcaf3d: Fully convolutional anchor-free 3d object detection. In the European Conference on Computer Vision (ECCV), 2022.
- Voxel r-cnn: Towards high performance voxel-based 3d object detection. In the AAAI Conference on Artificial Intelligence, volume 35, pages 1201–1209, 2021.
- Robotics dexterous grasping: The methods based on point cloud and deep learning. Frontiers in Neurorobotics, 15:73, 2021.
- Centernet: Keypoint triplets for object detection. In the IEEE International Conference on Computer Vision (ICCV), pages 6569–6578, 2019.
- Epnet: Enhancing point features with image semantics for 3d object detection. In the European Conference on Computer Vision (ECCV), pages 35–52, 2020.
- Efficient surface detection for augmented reality on 3d point clouds. In the Computer Graphics International (CGI), pages 89–92, 2016.
- Focal loss for dense object detection. In the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017.
- Learning to match 2d images and 3d lidar point clouds for outdoor augmented reality. In the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pages 654–655, 2020.
- Epnet++: Cascade bi-directional fusion for multi-modal 3d object detection. arXiv preprint arXiv:2112.11088, 2021.
- Group-free 3d object detection via transformers. In the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 2949–2958, 2021.
- Obstacle detection for autonomous guided vehicles through point cloud clustering using depth data. Machines, 10(5):332, 2022.
- Imvotenet: Boosting 3d object detection in point clouds with image votes. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4404–4413, 2020.
- Deep hough voting for 3d object detection in point clouds. In the IEEE International Conference on Computer Vision (ICCV), pages 9277–9286, 2019.
- Sun rgb-d: A rgb-d scene understanding benchmark suite. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 567–576, 2015.
- Fully convolutional one-stage 3d object detection on lidar range images. arXiv preprint arXiv:2205.13764, 2022.
- Fcos: Fully convolutional one-stage object detection. In the IEEE International Conference on Computer Vision (ICCV), pages 9627–9636, 2019.
- Softgroup for 3d instance segmentation on point clouds. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2708–2717, 2022.
- Densefusion: 6d object pose estimation by iterative dense fusion. In the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 3343–3352, 2019.
- Centernet3d: An anchor free object detector for point cloud. IEEE Transactions on Intelligent Transportation Systems, 2021.
- Cagroup3d: Class-aware grouping for 3d object detection on point clouds. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Fcos3d: Fully convolutional one-stage monocular 3d object detection. In the IEEE International Conference on Computer Vision (ICCV), pages 913–922, 2021.
- Multimodal token fusion for vision transformers. In the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 12186–12195, 2022.
- Pixor: Real-time 3d object detection from point clouds. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7652–7660, 2018.
- Boosting 3d object detection via object-focused image fusion. arXiv preprint arXiv:2207.10589, 2022.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. In the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4490–4499, 2018.
- Deformable detr: Deformable transformers for end-to-end object detection. In the International Conference on Learning Representations (ICLR), 2021.
- Yingping Liang (10 papers)
- Ying Fu (98 papers)