DiffuBox: Refining 3D Object Detection with Point Diffusion (2405.16034v2)
Abstract: Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. Our PyTorch implementation is available at \href{https://github.com/cxy1997/DiffuBox}{https://github.com/cxy1997/DiffuBox}.
- Driven to distraction: Self-supervised distractor learning for robust monocular visual odometry in urban environments. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1894–1900. IEEE, 2018.
- nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
- Diffusiondet: Diffusion model for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19830–19843, 2023.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Ithaca365: Dataset and driving perception under repeated and challenging weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21383–21392, 2022.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
- Imagen video: High definition video generation with diffusion models, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Video diffusion models, 2022.
- Edward L Ince. Ordinary differential equations. Courier Corporation, 1956.
- Lidarnet: A boundary-aware domain adaptation model for point cloud semantic segmentation, 2021.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- Lyft level 5 av dataset 2019. urlhttps://level5.lyft.com/dataset/, 2019.
- Single domain generalization for lidar semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17587–17598, June 2023.
- Diffref3d: A diffusion-based proposal refinement framework for 3d object detection. arXiv preprint arXiv:2310.16349, 2023.
- Conda: Unsupervised domain adaptation for lidar segmentation via regularized domain concatenation, 2023.
- Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019.
- Domain transfer for semantic segmentation of lidar data using deep neural networks. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8263–8270, 2020.
- Domain adaptive object detection for autonomous driving under foggy weather, 2022.
- Reward finetuning for faster and more accurate unsupervised object discovery. Advances in Neural Information Processing Systems, 36, 2024.
- Reward finetuning for faster and more accurate unsupervised object discovery. arXiv preprint arXiv:2310.19080, 2023.
- Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
- Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3164–3173, 2021.
- Pc2: Projection-conditioned point cloud diffusion for single-image 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12923–12932, June 2023.
- Minimal-entropy correlation alignment for unsupervised deep domain adaptation, 2017.
- 3d object detection with pointformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7463–7472, 2021.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
- Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 918–927, 2018.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Adversarial dropout regularization, 2018.
- Domain adaptation for vehicle detection from bird’s eye view lidar point cloud data, 2019.
- Cosmix: Compositional semantic mix for domain adaptation in 3d lidar segmentation, 2022.
- Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10529–10538, 2020.
- Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 770–779, 2019.
- Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1711–1719, 2020.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- OpenPCDet Development Team. Openpcdet: An open-source toolbox for 3d object detection clouds. https://github.com/open-mmlab/OpenPCDet, 2020.
- Nicolaas G Van Kampen. Stochastic differential equations. Physics reports, 24(3):171–228, 1976.
- Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation, 2019.
- Train in germany, test in the usa: Making 3d object detectors generalize. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11713–11723, 2020.
- Pillar-based object detection for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 18–34. Springer, 2020.
- Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation, 2021.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
- Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.
- St3d: Self-training for unsupervised domain adaptation on 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10368–10378, 2021.
- St3d++: Denoised self-training for unsupervised domain adaptation on 3d object detection. IEEE transactions on pattern analysis and machine intelligence, 45(5):6354–6371, 2022.
- 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11040–11048, 2020.
- Exploiting playbacks in unsupervised domain adaptation for 3d object detection in self-driving cars. In 2022 International Conference on Robotics and Automation (ICRA), pages 5070–5077. IEEE, 2022.
- Learning to detect mobile objects from lidar scans without labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1130–1140, 2022.
- Unsupervised adaptation from repeated traversals for autonomous driving. Advances in Neural Information Processing Systems, 35:27716–27729, 2022.
- Lion: Latent point diffusion models for 3d shape generation. arXiv preprint arXiv:2210.06978, 2022.
- 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5826–5835, October 2021.
- 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5826–5835, 2021.
- Diffusion-based 3d object detection with random boxes. arXiv preprint arXiv:2309.02049, 2023.
- Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4490–4499, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.