Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving (2402.02026v2)
Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.
- “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
- “Scalable object detection using deep neural networks,” in CVPR, 2014, pp. 2147–2154.
- “Coda: A real-world road corner case dataset for object detection in autonomous driving,” in ECCV, 2022, pp. 406–423.
- “Towards open world object detection,” in CVPR, 2021, pp. 5830–5840.
- “Ow-detr: Open-world detection transformer,” in CVPR, 2022, pp. 9235–9244.
- “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
- “Open-vocabulary object detection using captions,” in CVPR, 2021, pp. 14393–14402.
- “Improved visual-semantic alignment for zero-shot object detection,” in AAAI, 2020, vol. 34, pp. 11932–11939.
- “Out-of-distribution detection for automotive perception,” in ITSC, 2021, pp. 2938–2943.
- “Pixel-wise anomaly detection in complex driving scenes,” in CVPR, 2021, pp. 16918–16927.
- “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- “Good: Exploring geometric cues for detecting objects in an open world,” arXiv preprint arXiv:2212.11720, 2022.
- “Grounded language-image pre-training,” in CVPR, 2022, pp. 10965–10975.
- “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in ICCV, 2021, pp. 10786–10796.
- “Learning open-world object proposals without learning to classify,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5453–5460, 2022.
- “Open-set semi-supervised object detection,” in ECCV, 2022, pp. 143–159.
- “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
- “Soda10m: a large-scale 2d self/semi-supervised object detection dataset for autonomous driving,” arXiv preprint arXiv:2106.11118, 2021.
- “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in CVPR, 2020, pp. 2636–2645.
- Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022, pp. 12888–12900.
- “Yolop: You only look once for panoptic driving perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562, 2022.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.