Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lifting Multi-View Detection and Tracking to the Bird's Eye View (2403.12573v1)

Published 19 Mar 2024 in cs.CV

Abstract: Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird's Eye View. In this paper, we compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. Additionally, we present an architecture that aggregates the features of multiple times steps to learn robust detection and combines appearance- and motion-based cues for tracking. Most current tracking approaches either focus on pedestrians or vehicles. In our work, we combine both branches and add new challenges to multi-view detection with cross-scene setups. Our method generalizes to three public datasets across two domains: (1) pedestrian: Wildtrack and MultiviewX, and (2) roadside perception: Synthehicle, achieving state-of-the-art performance in detection and tracking. https://github.com/tteepe/TrackTacular

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Deep occlusion reasoning for multi-camera multi-target detection. In ICCV, pages 271–279, 2017.
  2. Multiple object tracking using k-shortest paths optimization. IEEE TPAMI, 33(9):1806–1819, 2011.
  3. Tracking without bells and whistles. In CVPR, pages 941–951, 2019.
  4. Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1–10, 2008.
  5. nuscenes: A multimodal dataset for autonomous driving. In CVPR, pages 11621–11631, 2020.
  6. Deep multi-camera people detection. In 2017 16th IEEE international conference on machine learning and applications (ICMLA), pages 848–853. IEEE, 2017.
  7. Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. In CVPR, pages 5030–5039, 2018.
  8. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In ICME, pages 1–6. IEEE, 2018.
  9. Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking. arXiv preprint arXiv:2308.13229, 2023.
  10. A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In 2022 IEEE Intelligent Vehicles Symposium (IV), pages 965–970. IEEE, 2022.
  11. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017.
  12. Multi-view tracking using weakly supervised human motion prediction. In WACV, 2023.
  13. Homography based multiple camera detection and tracking of people in a dense crowd. In CVPR, pages 1–8. IEEE, 2008.
  14. Detect to track and track to detect. In ICCV, pages 3038–3046, 2017.
  15. Pets2009: Dataset and challenge. In 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, pages 1–6. IEEE, 2009.
  16. Multicamera people tracking with a probabilistic occupancy map. IEEE TPAMI, 30(2):267–282, 2007.
  17. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  18. The interstate-24 3d dataset: a new benchmark for 3d multi-camera vehicle tracking. arXiv preprint arXiv:2308.14833, 2023.
  19. Learning from unlabelled videos using contrastive predictive neural 3d mapping. arXiv preprint arXiv:1906.03764, 2019.
  20. Simple-BEV: What really matters for multi-sensor bev perception? In IEEE International Conference on Robotics and Automation (ICRA), 2023.
  21. Multiple view geometry in computer vision. Cambridge university press, 2003.
  22. Mask r-cnn. In ICCV, pages 2961–2969, 2017.
  23. Lef: Late-to-early temporal fusion for lidar 3d object detection. arXiv preprint arXiv:2309.16870, 2023.
  24. Synthehicle: Multi-vehicle multi-camera tracking in virtual cities. In WACV Worksh., pages 1–11, 2023.
  25. Hypergraphs for joint multi-view reconstruction and multi-object tracking. In CVPR, pages 3650–3657, 2013.
  26. Multiview detection with shadow transformer (and view-coherent data augmentation). In ACM MM, 2021.
  27. Multiview detection with feature perspective transformation. In ECCV, 2020.
  28. Principal axis-based correspondence between multiple cameras for people tracking. IEEE TPAMI, 28(4):663–671, 2006.
  29. Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960.
  30. Branch-and-price global optimization for multi-view multi-target tracking. In CVPR, pages 1987–1994. IEEE, 2012.
  31. Multi-view target transformation for pedestrian detection. In WACV Worksh., pages 90–99, 2023.
  32. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In ECCV, pages 1–18, 2022.
  33. Focal loss for dense object detection. In CVPR, pages 2980–2988, 2017.
  34. Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking. In CVPR, pages 8866–8875, 2022.
  35. A bayesian filter for multi-view 3d multi-object tracking with occlusion handling. IEEE TPAMI, 44(5):2246–2263, 2020.
  36. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, pages 194–210. Springer, 2020.
  37. 3d random occlusion and multi-layer projection for deep multi-camera pedestrian localization. In ECCV, pages 695–710. Springer, 2022.
  38. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  39. Performance measures and a data set for multi-target, multi-camera tracking. In ECCV, pages 17–35. Springer, 2016.
  40. Simple cues lead to a strong multi-object tracker. In CVPR, pages 13813–13823, 2023.
  41. Multi-commodity network flow for tracking multiple people. IEEE TPAMI, 36(8):1614–1627, 2013.
  42. Stacked homography transformations for multi-view pedestrian detection. In CVPR, pages 6049–6057, 2021.
  43. Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In CVPR, pages 8797–8806, 2019.
  44. EarlyBird: Early-fusion for multi-view tracking in the bird’s eye view. In WACV Worksh., pages 102–111, 2024.
  45. MOTS: Multi-object tracking and segmentation. In CVPR, 2019.
  46. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926, 2023.
  47. Towards real-time multi-object tracking. In ECCV, pages 107–122. Springer, 2020.
  48. 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10359–10366. IEEE, 2020.
  49. Simple online and realtime tracking with a deep association metric. In ICIP, pages 3645–3649. IEEE, 2017.
  50. Unleashing HyDRa: Hybrid fusion, depth consistency and radar for unified 3d perception, 2024.
  51. Multi-view people tracking via hierarchical trajectory composition. In CVPR, pages 4256–4265, 2016.
  52. Center-based 3d object detection and tracking. In CVPR, pages 11784–11793, 2021.
  53. Real-time 3d deep multi-camera tracking. arXiv preprint arXiv:2003.11753, 2020.
  54. FairMot: On the fairness of detection and re-identification in multiple object tracking. IJCV, 129:3069–3087, 2021.
  55. Objects as points. In arXiv preprint arXiv:1904.07850, 2019.
  56. Tracking objects as points. In ECCV, pages 474–490. Springer, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub