Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving (2309.06547v2)

Published 12 Sep 2023 in cs.CV

Abstract: Unlike humans, who can effortlessly estimate the entirety of objects even when partially occluded, modern computer vision algorithms still find this aspect extremely challenging. Leveraging this amodal perception for autonomous driving remains largely untapped due to the lack of suitable datasets. The curation of these datasets is primarily hindered by significant annotation costs and mitigating annotator subjectivity in accurately labeling occluded regions. To address these limitations, we introduce AmodalSynthDrive, a synthetic multi-task multi-modal amodal perception dataset. The dataset provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry for 150 driving sequences with over 1M object annotations in diverse traffic, weather, and lighting conditions. AmodalSynthDrive supports multiple amodal scene understanding tasks including the introduced amodal depth estimation for enhanced spatial understanding. We evaluate several baselines for each of these tasks to illustrate the challenges and set up public benchmarking servers. The dataset is available at http://amodalsynthdrive.cs.uni-freiburg.de.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. R. Mohan, K. Kumaraswamy, J. V. Hurtado, K. Petek, and A. Valada, “Panoptic out-of-distribution segmentation,” arXiv preprint arXiv:2310.11797, 2023.
  2. R. Mohan, T. Elsken, A. Zela, J. H. Metzen, B. Staffler, T. Brox, A. Valada, and F. Hutter, “Neural architecture search for dense prediction tasks in computer vision,” International Journal of Computer Vision, vol. 131, no. 7, pp. 1784–1807, 2023.
  3. M. Büchner and A. Valada, “3d multi-object tracking using graph neural networks with cross-edge modality attention,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9707–9714, 2022.
  4. N. Vödisch, D. Cattaneo, W. Burgard, and A. Valada, “Continual slam: Beyond lifelong simultaneous localization and mapping through continual learning,” in The International Symposium of Robotics Research, 2022, pp. 19–35.
  5. F. Schmalstieg, D. Honerkamp, T. Welschehold, and A. Valada, “Learning long-horizon robot exploration strategies for multi-object search in continuous action spaces,” in The Int. Symp. of Robotics Research, 2022, pp. 52–66.
  6. K. Li and J. Malik, “Amodal instance segmentation,” in European Conference on Computer Vision, 2016, pp. 677–693.
  7. M. Tran, K. Vo, K. Yamazaki, A. Fernandes, M. Kidd, and N. Le, “Aisformer: Amodal instance segmentation with transformer,” arXiv preprint arXiv:2210.06323, 2022.
  8. J. Breitenstein and T. Fingscheidt, “Amodal cityscapes: A new dataset, its generation, and an amodal semantic segmentation challenge baseline,” in IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 1018–1025.
  9. R. Mohan and A. Valada, “Amodal panoptic segmentation,” in IEEE Conf. Computer Vision Recognition, 2022, pp. 21 023–21 032.
  10. ——, “Perceiving the invisible: Proposal-free amodal panoptic segmentation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9302–9309, 2022.
  11. Y. Zhu, Y. Tian, D. Metaxas, and P. Dollár, “Semantic amodal segmentation,” in IEEE Conf. Computer Vision Recognition, 2017, pp. 1464–1472.
  12. A. Valada, A. Dhall, and W. Burgard, “Convoluted mixture of deep experts for robust semantic segmentation,” in IEEE/RSJ International conference on intelligent robots and systems (IROS) workshop, state estimation and terrain perception for all terrain mobile robots, 2016.
  13. N. Gosala, K. Petek, P. L. Drews-Jr, W. Burgard, and A. Valada, “Skyeye: Self-supervised bird’s-eye-view semantic mapping using monocular frontal view images,” in IEEE Conf. on Computer Vision and Pattern Recogntion, 2023, pp. 14 901–14 910.
  14. A. Mertan, D. J. Duff, and G. Unal, “Single image depth estimation: An overview,” Digital Signal Processing, vol. 123, p. 103441, 2022.
  15. Y. Ma, T. Wang, X. Bai, H. Yang, Y. Hou, Y. Wang, Y. Qiao, R. Yang, D. Manocha, and X. Zhu, “Vision-centric bev perception: A survey,” arXiv preprint arXiv:2208.02797, 2022.
  16. P. Purkait, C. Zach, and I. Reid, “Seeing behind things: Extending semantic segmentation to occluded regions,” in IEEE/RSJ Int. Con. on Intelligent Robots and Systems (IROS), 2019, pp. 1998–2005.
  17. P. Follmann, R. König, P. Härtinger, M. Klostermann, and T. Böttger, “Learning to see the invisible: End-to-end trainable amodal instance segmentation,” in IEEE Winter Conf. on Applications of Computer Vision, 2019, pp. 1328–1336.
  18. L. Qi, L. Jiang, S. Liu, X. Shen, and J. Jia, “Amodal instance segmentation with kins dataset,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 3014–3023.
  19. L. Ke, Y.-W. Tai, and C.-K. Tang, “Deep occlusion-aware instance segmentation with overlapping bilayers,” in Proc. of the IEEE/CVF conf. on computer vision and pattern recognition, 2021, pp. 4019–4028.
  20. N. D. Reddy, R. Tamburo, and S. G. Narasimhan, “Walt: Watch and learn 2d amodal representation from time-lapse imagery,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 9356–9366.
  21. J. Breitenstein, J. Löhdefink, and T. Fingscheidt, “Joint prediction of amodal and visible semantic segmentation for automated driving,” in Europen Conf. on Computer Vision Workshops, 2023, pp. 633–645.
  22. J. V. Hurtado and A. Valada, “Semantic scene segmentation for robotics,” in Deep Learning for Robot Perception and Cognition, 2022.
  23. R. Mohan, J. Arce, S. Mokhtar, D. Cattaneo, and A. Valada, “Syn-mediverse: A multimodal synthetic dataset for intelligent scene understanding of healthcare facilities,” arXiv preprint arXiv:2308.03193, 2023.
  24. C. Sakaridis, D. Dai, and L. Van Gool, “Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding,” in Proc. of the IEEE/CVF Int. Conf. on Computer Vision, 2021, pp. 10 765–10 775.
  25. A. R. Sekkat, Y. Dupuis, P. Vasseur, and P. Honeine, “The omniscape dataset,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2020, pp. 1603–1608.
  26. A. R. Sekkat, Y. Dupuis, V. R. Kumar, H. Rashed, S. Yogamani, P. Vasseur, and P. Honeine, “Synwoodscape: Synthetic surround-view fisheye camera dataset for autonomous driving,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8502–8509, 2022.
  27. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proc. of the Annual Conf. on Robot Learning, 2017, pp. 1–16.
  28. J. Zhang, R. Liu, H. Shi, K. Yang, S. Reiß, K. Peng, H. Fu, K. Wang, and R. Stiefelhagen, “Delivering arbitrary-modal semantic segmentation,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 1136–1147.
  29. T. Sun, M. Segu, J. Postels, Y. Wang, L. Van Gool, B. Schiele, F. Tombari, and F. Yu, “Shift: a synthetic driving dataset for continuous multi-task domain adaptation,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 21 371–21 382.
  30. Y.-T. Hu, H.-S. Chen, K. Hui, J.-B. Huang, and A. G. Schwing, “Sail-vos: Semantic amodal instance level video object segmentation-a synthetic dataset and baselines,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 3105–3115.
  31. C. A. Diaz-Ruiz, Y. Xia, Y. You, J. Nino, J. Chen, J. Monica, X. Chen, K. Luo, Y. Wang, M. Emond, et al., “Ithaca365: Dataset and driving perception under repeated and challenging weather conditions,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 21 383–21 392.
  32. S. Qiao, Y. Zhu, H. Adam, A. Yuille, and L.-C. Chen, “Vip-deeplab: Learning visual perception with depth-aware video panoptic segmentation,” in Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021, pp. 3997–4008.
  33. H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in Proc. of the IEEE conf. on computer vision and pattern recognition, 2018.
  34. L.-C. Chen, M. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, H. Adam, and J. Shlens, “Searching for efficient multi-scale architectures for dense image prediction,” Advances in neural information processing systems, vol. 31, 2018.
  35. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  36. T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H. D. Iii, and K. Crawford, “Datasheets for datasets,” Communications of the ACM, vol. 64, no. 12, pp. 86–92, 2021.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com