Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving (2402.02026v2)

Published 3 Feb 2024 in cs.CV and cs.AI

Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
  2. “Scalable object detection using deep neural networks,” in CVPR, 2014, pp. 2147–2154.
  3. “Coda: A real-world road corner case dataset for object detection in autonomous driving,” in ECCV, 2022, pp. 406–423.
  4. “Towards open world object detection,” in CVPR, 2021, pp. 5830–5840.
  5. “Ow-detr: Open-world detection transformer,” in CVPR, 2022, pp. 9235–9244.
  6. “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
  7. “Open-vocabulary object detection using captions,” in CVPR, 2021, pp. 14393–14402.
  8. “Improved visual-semantic alignment for zero-shot object detection,” in AAAI, 2020, vol. 34, pp. 11932–11939.
  9. “Out-of-distribution detection for automotive perception,” in ITSC, 2021, pp. 2938–2943.
  10. “Pixel-wise anomaly detection in complex driving scenes,” in CVPR, 2021, pp. 16918–16927.
  11. “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  12. “Good: Exploring geometric cues for detecting objects in an open world,” arXiv preprint arXiv:2212.11720, 2022.
  13. “Grounded language-image pre-training,” in CVPR, 2022, pp. 10965–10975.
  14. “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in ICCV, 2021, pp. 10786–10796.
  15. “Learning open-world object proposals without learning to classify,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5453–5460, 2022.
  16. “Open-set semi-supervised object detection,” in ECCV, 2022, pp. 143–159.
  17. “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
  18. “Soda10m: a large-scale 2d self/semi-supervised object detection dataset for autonomous driving,” arXiv preprint arXiv:2106.11118, 2021.
  19. “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in CVPR, 2020, pp. 2636–2645.
  20. Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
  21. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022, pp. 12888–12900.
  22. “Yolop: You only look once for panoptic driving perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562, 2022.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub