Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving (2401.07322v2)

Published 14 Jan 2024 in cs.CV

Abstract: Road scene understanding is crucial in autonomous driving, enabling machines to perceive the visual environment. However, recent object detectors tailored for learning on datasets collected from certain geographical locations struggle to generalize across different locations. In this paper, we present RSUD20K, a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads, and includes 130K bounding box annotations for 13 objects. This challenging dataset encompasses diverse road scenes, narrow streets and highways, featuring objects from different viewpoints and scenes from crowded environments with densely cluttered objects and various weather conditions. Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity. We thoroughly examine the dataset, benchmarking various state-of-the-art object detectors and exploring large vision models as image annotators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. “A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond,” arXiv preprint arXiv:2304.00501, 2023.
  2. “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, 2015, vol. 28.
  3. “End-to-end object detection with transformers,” in Proc. European Conference on Computer Vision, 2020, pp. 213–229.
  4. “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  5. “The mapillary vistas dataset for semantic understanding of street scenes,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 4990–4999.
  6. “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  7. “BDD100K: A diverse driving dataset for heterogeneous multitask learning,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2636–2645.
  8. “Scalability in perception for autonomous driving: Waymo open dataset,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2446–2454.
  9. “Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
  10. “Simple open-vocabulary object detection with vision Transformers,” in Proc. European Conference on Computer Vision, 2022.
  11. “Detecting twenty-thousand classes using image-level supervision,” in Proc. European Conference on Computer Vision, 2022, pp. 350–368.
  12. “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
  13. ASM Shihavuddin and Mohammad Rifat Ahmmad Rashid, “DhakaAI,” in Harvard Dataverse, 2020.
  14. “Densely-populated traffic detection using YOLOv5 and non-maximum suppression ensembling,” in Proc. International Conference on Big Data, IoT, and Machine Learning, 2022, pp. 567–578.
  15. “Poribohon-BD: Bangladeshi local vehicle image dataset with annotation for classification,” Data in Brief, vol. 33, 2020.
  16. “A deep learning based Bangladeshi vehicle classification using fine-tuned multi-class vehicle image network (MVINet) model,” in Proc. International Conference on Next-Generation Computing, IoT and Machine Learning, 2023, pp. 1–6.
  17. “SSD: Single shot multibox detector,” in Proc. European Conference on Computer Vision, 2016, pp. 21–37.
  18. “Focal loss for dense object detection,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
  19. “CenterNet: Keypoint triplets for object detection,” in Proc. IEEE International Conference on Computer Vision, 2019, pp. 6569–6578.
  20. “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
  21. Ross Girshick, “Fast R-CNN,” in Proc. IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
  22. “Mask R-CNN,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 2961–2969.
  23. “STAR: noisy semi-supervised transfer learning for visual classification,” in Proc. International Workshop on Multimedia Content Analysis in Sports, 2021, pp. 25–33.
  24. “RTMDet: An empirical study of designing real-time object detectors,” arXiv preprint arXiv:2212.07784, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com