BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View (2112.11790v3)

Published 22 Dec 2021 in cs.CV

Abstract: Autonomous driving perceives its surroundings for decision making, which is one of the most complex scenarios in visual perception. The success of paradigm innovation in solving the 2D object detection task inspires us to seek an elegant, feasible, and scalable paradigm for fundamentally pushing the performance boundary in this area. To this end, we contribute the BEVDet paradigm in this paper. BEVDet performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed. We merely reuse existing modules to build its framework but substantially develop its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy. In the experiment, BEVDet offers an excellent trade-off between accuracy and time-efficiency. As a fast version, BEVDet-Tiny scores 31.2% mAP and 39.2% NDS on the nuScenes val set. It is comparable with FCOS3D, but requires just 11% computational budget of 215.3 GFLOPs and runs 9.2 times faster at 15.6 FPS. Another high-precision version dubbed BEVDet-Base scores 39.3% mAP and 47.2% NDS, significantly exceeding all published results. With a comparable inference speed, it surpasses FCOS3D by a large margin of +9.8% mAP and +10.0% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

Citations (562)

View on Semantic Scholar

Summary

The paper introduces BEVDet, a modular framework that converts image views to BEV for integrated 3D object detection.
It achieves impressive performance with BEVDet-Tiny at 31.2% mAP and BEVDet-Base at 39.3% mAP, using significantly lower computational cost.
Customized data augmentation and Scale-NMS techniques enhance detection accuracy, making BEVDet highly effective for autonomous driving.

BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View

The paper presents BEVDet, a novel paradigm for 3D object detection using multi-camera systems, leveraging the Bird-Eye-View (BEV) approach for improved accuracy and computational efficiency. This work aims to address the complex demands of autonomous driving by utilizing a unified framework that integrates seamlessly with BEV semantic segmentation tasks.

Methodology and Framework

BEVDet is architecturally divided into four key modules: an image-view encoder, a view transformer, a BEV encoder, and a task-specific detection head. This modular design allows for flexibility and reuse of components known to be effective in related tasks. The image-view encoder utilizes backbones like ResNet and SwinTransformer, followed by a neck for feature extraction. The view transformer then converts image-view data to BEV, leveraging depth prediction. Subsequently, the BEV encoder refines these features before the task-specific head performs the 3D detection.

A critical innovation in BEVDet is the customized data augmentation strategy, addressing overfitting issues identified during BEV training. By applying distinct augmentation strategies in image-view and BEV spaces, BEVDet achieves robustness and better generalization, aligning its performance with state-of-the-art methods while maintaining a significantly reduced inference time and computational budget.

Experimental Validation

The authors present comprehensive evaluations on the nuScenes dataset, showcasing BEVDet's superior performance compared to existing paradigms like FCOS3D, DETR3D, and PGD. The BEVDet-Tiny variant demonstrates a remarkable balance with 31.2% mAP and 39.2% NDS at only 11% of the computational cost of its counterparts, achieving 15.6 FPS. The more precise BEVDet-Base configuration sets new performance records with 39.3% mAP and 47.2% NDS, emphasizing its efficiency and effectiveness. Additionally, Scale-NMS, an upgraded Non-Maximum Suppression strategy, further boosts detection accuracy, particularly for small objects.

Implications and Future Directions

The practical implications of BEVDet in autonomous driving are manifold. Firstly, it offers a scalable, unified framework that integrates 3D detection with other BEV tasks, facilitating real-time decision-making. The computational efficiency achieved by BEVDet makes it viable for deployment in embedded systems where resources are limited.

Looking forward, the research opens avenues for improving attribute prediction accuracy by potentially integrating image-view-based methods and exploring multi-task learning within the BEVDet framework. A deeper investigation into the combined use of LiDAR and camera inputs may also enhance the robustness of object detection in diverse environmental conditions.

In summary, BEVDet represents a significant step forward in vision-based 3D object detection, achieving an optimal blend of accuracy and efficiency, and setting the stage for future advancements in fully integrated, high-performance autonomous driving systems.

PDF Markdown

Related Papers

GitHub

GitHub - HuangJunJie2017/BEVDet: Official code base of the BEVDet series . (1,397 stars)