Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving (2402.02026v2)

Published 3 Feb 2024 in cs.CV and cs.AI

Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
  2. “Scalable object detection using deep neural networks,” in CVPR, 2014, pp. 2147–2154.
  3. “Coda: A real-world road corner case dataset for object detection in autonomous driving,” in ECCV, 2022, pp. 406–423.
  4. “Towards open world object detection,” in CVPR, 2021, pp. 5830–5840.
  5. “Ow-detr: Open-world detection transformer,” in CVPR, 2022, pp. 9235–9244.
  6. “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
  7. “Open-vocabulary object detection using captions,” in CVPR, 2021, pp. 14393–14402.
  8. “Improved visual-semantic alignment for zero-shot object detection,” in AAAI, 2020, vol. 34, pp. 11932–11939.
  9. “Out-of-distribution detection for automotive perception,” in ITSC, 2021, pp. 2938–2943.
  10. “Pixel-wise anomaly detection in complex driving scenes,” in CVPR, 2021, pp. 16918–16927.
  11. “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  12. “Good: Exploring geometric cues for detecting objects in an open world,” arXiv preprint arXiv:2212.11720, 2022.
  13. “Grounded language-image pre-training,” in CVPR, 2022, pp. 10965–10975.
  14. “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in ICCV, 2021, pp. 10786–10796.
  15. “Learning open-world object proposals without learning to classify,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5453–5460, 2022.
  16. “Open-set semi-supervised object detection,” in ECCV, 2022, pp. 143–159.
  17. “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
  18. “Soda10m: a large-scale 2d self/semi-supervised object detection dataset for autonomous driving,” arXiv preprint arXiv:2106.11118, 2021.
  19. “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in CVPR, 2020, pp. 2636–2645.
  20. Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
  21. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022, pp. 12888–12900.
  22. “Yolop: You only look once for panoptic driving perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562, 2022.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub