U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization (2310.13766v2)
Abstract: Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.
- “Visual place recognition: A survey” In IEEE Transactions on Robotics 32.1 IEEE, 2015, pp. 1–19
- Sourav Garg, Tobias Fischer and Michael Milford “Where is your place, visual place recognition?” In International Joint Conference on Artificial Intelligence (IJCAI2021), 2021
- “M22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTBEV: Multi-camera joint 3D detection and segmentation with unified birds-eye view representation” In ArXiv Preprint, 2022 arXiv:2204.05088
- “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation” In IEEE International Conference on Robotics and Automation (ICRA), 2023
- Yunpeng Zhang, Zheng Zhu and Dalong Du “OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction” In arXiv preprint arXiv:2304.05316, 2023
- “SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving” In arXiv preprint arXiv:2303.09551, 2023
- “Visual Mapping and Localization System Based on Compact Instance-Level Road Markings With Spatial Uncertainty” In IEEE Robotics and Automation Letters 7.4 IEEE, 2022, pp. 10802–10809
- “BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images” In arXiv preprint arXiv:2211.14927, 2022
- “NeMO: Neural Map Growing System for Spatiotemporal Fusion in Bird’s-Eye-View and BDD-Map Benchmark” In arXiv preprint arXiv:2306.04540, 2023
- “Cross-view Transformers for real-time Map-view Semantic Segmentation” In CVPR, 2022
- “Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D” In Proceedings of the European Conference on Computer Vision, 2020
- “Predicting semantic map representations from images using pyramid occupancy networks” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11138–11147
- “Understanding bird’s-eye view of road semantics using an onboard camera” In IEEE Robotics and Automation Letters 7.2 IEEE, 2022, pp. 3302–3309
- “Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks” In CoRR abs/2003.13402, 2020
- “OrienterNet: Visual Localization in 2D Public Maps with Neural Matching” In CVPR, 2023
- “SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding” In arXiv preprint arXiv:2306.05407, 2023
- “Pointpainting: Sequential fusion for 3d object detection” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4604–4612
- Tianwei Yin, Xingyi Zhou and Philipp Krähenbühl “Multimodal virtual point 3d detection” In Advances in Neural Information Processing Systems 34, 2021, pp. 16494–16507
- “Cross-View Semantic Segmentation for Sensing Surroundings” In IEEE Robotics and Automation Letters 5, 2019, pp. 4867–4873
- “FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras” In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15253–15262
- “Enabling spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation” In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 5133–5139
- Nikhil Bharadwaj Gosala and Abhinav Valada “Bird’s-Eye-View Panoptic Segmentation Using Monocular Frontal View Images” In IEEE Robotics and Automation Letters PP, 2021, pp. 1–1
- “BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View” In ArXiv abs/2112.11790, 2021
- “NetVLAD: CNN architecture for weakly supervised place recognition” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5297–5307
- “LTSR: Long-term Semantic Relocalization based on HD Map for Autonomous Vehicles” In 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2171–2178 IEEE
- “X-view: Graph-based semantic multi-view localization” In IEEE Robotics and Automation Letters 3.3 IEEE, 2018, pp. 1687–1694
- “Coming down to earth: Satellite-to-street view synthesis for geo-localization” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6488–6497
- “Geometry-aware satellite-to-ground image synthesis for urban areas” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 859–867
- “Uncertainty-aware Vision-based Metric Cross-view Geolocalization” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21621–21631
- “Satellite image based cross-view localization for autonomous vehicle” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3592–3599 IEEE
- “Visual cross-view metric localization with dense uncertainty estimates” In European Conference on Computer Vision, 2022, pp. 90–106 Springer
- “SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17225–17234
- “Co-Visual Pattern-Augmented Generative Transformer Learning for Automobile Geo-Localization” In Remote Sensing 15.9 MDPI, 2023, pp. 2221
- “Joint representation learning and keypoint detection for cross-view geo-localization” In IEEE Transactions on Image Processing 31 IEEE, 2022, pp. 3780–3792
- Hongji Yang, Xiufan Lu and Yingying Zhu “Cross-view geo-localization with layer-to-layer transformer” In Advances in Neural Information Processing Systems 34, 2021, pp. 29009–29020
- Olaf Ronneberger, Philipp Fischer and Thomas Brox “U-Net: Convolutional Networks for Biomedical Image Segmentation” In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 abs/1505.04597 Springer International Publishing, 2015
- “nuscenes: A multimodal dataset for autonomous driving” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11621–11631
- Mingxing Tan and Quoc V. Le “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” In ICML 2019 abs/1905.11946, 2019
- “Pyramid Attention Network for Semantic Segmentation” In ArXiv abs/1805.10180, 2018
- “QATM: Quality-Aware Template Matching for Deep Learning” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11545–11554
- Diogo C Luvizon, Hedi Tabia and David Picard “Human pose regression by combining indirect part detection and contextual information” In Computers & Graphics 85 Elsevier, 2019, pp. 15–22
- “Focal loss for dense object detection” In Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988
- Leslie N Smith and Nicholay Topin “Super-convergence: Very fast training of neural networks using large learning rates” In Artificial Intelligence and Machine Learning for Multi-Domain Operations applications 11006, 2019, pp. 369–386 SPIE
- “Decoupled Weight Decay Regularization” In International Conference on Learning Representations, 2017
- “Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11135–11144