EAN-MapNet: Efficient Vectorized HD Map Construction with Anchor Neighborhoods (2402.18278v2)
Abstract: High-definition (HD) map is crucial for autonomous driving systems. Most existing works design map elements detection heads based on the DETR decoder. However, the initial queries lack explicit incorporation of physical positional information, and vanilla self-attention entails high computational complexity. Therefore, we propose EAN-MapNet for Efficiently constructing HD map using Anchor Neighborhoods. Firstly, we design query units based on the anchor neighborhoods, allowing non-neighborhood central anchors to effectively assist in fitting the neighborhood central anchors to the target points representing map elements. Then, we propose grouped local self-attention (GL-SA) by leveraging the relative instance relationship among the queries. This facilitates direct feature interaction among queries of the same instances, while innovatively employing local queries as intermediaries for interaction among queries from different instances. Consequently, GL-SA significantly reduces the computational complexity of self-attention while ensuring ample feature interaction among queries. On the nuScenes dataset, EAN-MapNet achieves a state-of-the-art performance with 63.0 mAP after training for 24 epochs, surpassing MapTR by 12.7 mAP. Furthermore, it considerably reduces memory consumption by 8198M compared to MapTRv2.
- Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079, 2018.
- Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In 2019 International Conference on Robotics and Automation (ICRA), pages 2090–2096. IEEE, 2019.
- Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449, 2019.
- Road-model-based road boundary extraction for high definition map via lidar. IEEE Transactions on Intelligent Transportation Systems, 23(10):18456–18465, 2022.
- Ji Zhang and Sanjiv Singh. Loam: Lidar odometry and mapping in real-time. In Robotics: Science and systems, volume 2, pages 1–9. Berkeley, CA, 2014.
- Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics, 33(5):1255–1262, 2017.
- Hdmapnet: An online hd map construction and evaluation framework. In 2022 International Conference on Robotics and Automation (ICRA), pages 4628–4634. IEEE, 2022.
- Superfusion: Multilevel lidar-camera fusion for long-range hd map generation. arXiv preprint arXiv:2211.15656, 2022.
- Bevsegformer: Bird’s eye view semantic segmentation from arbitrary camera rigs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5935–5943, 2023.
- Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437, 2022.
- Maptrv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736, 2023.
- Streammapnet: Streaming mapping network for vectorized online hd map construction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7356–7365, 2024.
- Insightmapper: A closer look at inner-instance information for vectorized high-definition mapping. arXiv preprint arXiv:2308.08543, 2023.
- Scalablemap: Scalable map learning for online long-range vectorized hd map construction. arXiv preprint arXiv:2310.13378, 2023.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
- Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
- Anchor detr: Query design for transformer-based object detection. arXiv preprint arXiv:2109.07107, 3(6), 2021.
- Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3651–3660, 2021.
- Dab-detr: Dynamic anchor boxes are better queries for detr. arXiv preprint arXiv:2201.12329, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Noam Shazeer. Fast transformer decoding: One write-head is all you need. arXiv preprint arXiv:1911.02150, 2019.
- Gqa: Training generalized multi-query transformer models from multi-head checkpoints. arXiv preprint arXiv:2305.13245, 2023.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
- Complementing onboard sensors with satellite map: A new perspective for hd map construction. arXiv preprint arXiv:2308.15427, 2023.
- Mv-map: Offboard hd-map generation with multi-view consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8658–8668, 2023.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
- Efficient and robust 2d-to-bev representation learning via geometry-guided kernel transformer. arXiv preprint arXiv:2206.04584, 2022.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 194–210. Springer, 2020.
- Vectormapnet: End-to-end vectorized hd map learning. In International Conference on Machine Learning, pages 22352–22369. PMLR, 2023.
- End-to-end vectorized hd-map construction with piecewise bezier curve. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13218–13228, 2023.
- Pivotnet: Vectorized pivot learning for end-to-end hd map construction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3672–3682, 2023.