Lifting Multi-View Detection and Tracking to the Bird's Eye View (2403.12573v1)
Abstract: Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird's Eye View. In this paper, we compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. Additionally, we present an architecture that aggregates the features of multiple times steps to learn robust detection and combines appearance- and motion-based cues for tracking. Most current tracking approaches either focus on pedestrians or vehicles. In our work, we combine both branches and add new challenges to multi-view detection with cross-scene setups. Our method generalizes to three public datasets across two domains: (1) pedestrian: Wildtrack and MultiviewX, and (2) roadside perception: Synthehicle, achieving state-of-the-art performance in detection and tracking. https://github.com/tteepe/TrackTacular
- Deep occlusion reasoning for multi-camera multi-target detection. In ICCV, pages 271–279, 2017.
- Multiple object tracking using k-shortest paths optimization. IEEE TPAMI, 33(9):1806–1819, 2011.
- Tracking without bells and whistles. In CVPR, pages 941–951, 2019.
- Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1–10, 2008.
- nuscenes: A multimodal dataset for autonomous driving. In CVPR, pages 11621–11631, 2020.
- Deep multi-camera people detection. In 2017 16th IEEE international conference on machine learning and applications (ICMLA), pages 848–853. IEEE, 2017.
- Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. In CVPR, pages 5030–5039, 2018.
- Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In ICME, pages 1–6. IEEE, 2018.
- Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking. arXiv preprint arXiv:2308.13229, 2023.
- A9-dataset: Multi-sensor infrastructure-based dataset for mobility research. In 2022 IEEE Intelligent Vehicles Symposium (IV), pages 965–970. IEEE, 2022.
- CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017.
- Multi-view tracking using weakly supervised human motion prediction. In WACV, 2023.
- Homography based multiple camera detection and tracking of people in a dense crowd. In CVPR, pages 1–8. IEEE, 2008.
- Detect to track and track to detect. In ICCV, pages 3038–3046, 2017.
- Pets2009: Dataset and challenge. In 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance, pages 1–6. IEEE, 2009.
- Multicamera people tracking with a probabilistic occupancy map. IEEE TPAMI, 30(2):267–282, 2007.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- The interstate-24 3d dataset: a new benchmark for 3d multi-camera vehicle tracking. arXiv preprint arXiv:2308.14833, 2023.
- Learning from unlabelled videos using contrastive predictive neural 3d mapping. arXiv preprint arXiv:1906.03764, 2019.
- Simple-BEV: What really matters for multi-sensor bev perception? In IEEE International Conference on Robotics and Automation (ICRA), 2023.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Mask r-cnn. In ICCV, pages 2961–2969, 2017.
- Lef: Late-to-early temporal fusion for lidar 3d object detection. arXiv preprint arXiv:2309.16870, 2023.
- Synthehicle: Multi-vehicle multi-camera tracking in virtual cities. In WACV Worksh., pages 1–11, 2023.
- Hypergraphs for joint multi-view reconstruction and multi-object tracking. In CVPR, pages 3650–3657, 2013.
- Multiview detection with shadow transformer (and view-coherent data augmentation). In ACM MM, 2021.
- Multiview detection with feature perspective transformation. In ECCV, 2020.
- Principal axis-based correspondence between multiple cameras for people tracking. IEEE TPAMI, 28(4):663–671, 2006.
- Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960.
- Branch-and-price global optimization for multi-view multi-target tracking. In CVPR, pages 1987–1994. IEEE, 2012.
- Multi-view target transformation for pedestrian detection. In WACV Worksh., pages 90–99, 2023.
- Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In ECCV, pages 1–18, 2022.
- Focal loss for dense object detection. In CVPR, pages 2980–2988, 2017.
- Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking. In CVPR, pages 8866–8875, 2022.
- A bayesian filter for multi-view 3d multi-object tracking with occlusion handling. IEEE TPAMI, 44(5):2246–2263, 2020.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, pages 194–210. Springer, 2020.
- 3d random occlusion and multi-layer projection for deep multi-camera pedestrian localization. In ECCV, pages 695–710. Springer, 2022.
- Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
- Performance measures and a data set for multi-target, multi-camera tracking. In ECCV, pages 17–35. Springer, 2016.
- Simple cues lead to a strong multi-object tracker. In CVPR, pages 13813–13823, 2023.
- Multi-commodity network flow for tracking multiple people. IEEE TPAMI, 36(8):1614–1627, 2013.
- Stacked homography transformations for multi-view pedestrian detection. In CVPR, pages 6049–6057, 2021.
- Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In CVPR, pages 8797–8806, 2019.
- EarlyBird: Early-fusion for multi-view tracking in the bird’s eye view. In WACV Worksh., pages 102–111, 2024.
- MOTS: Multi-object tracking and segmentation. In CVPR, 2019.
- Exploring object-centric temporal modeling for efficient multi-view 3d object detection. arXiv preprint arXiv:2303.11926, 2023.
- Towards real-time multi-object tracking. In ECCV, pages 107–122. Springer, 2020.
- 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10359–10366. IEEE, 2020.
- Simple online and realtime tracking with a deep association metric. In ICIP, pages 3645–3649. IEEE, 2017.
- Unleashing HyDRa: Hybrid fusion, depth consistency and radar for unified 3d perception, 2024.
- Multi-view people tracking via hierarchical trajectory composition. In CVPR, pages 4256–4265, 2016.
- Center-based 3d object detection and tracking. In CVPR, pages 11784–11793, 2021.
- Real-time 3d deep multi-camera tracking. arXiv preprint arXiv:2003.11753, 2020.
- FairMot: On the fairness of detection and re-identification in multiple object tracking. IJCV, 129:3069–3087, 2021.
- Objects as points. In arXiv preprint arXiv:1904.07850, 2019.
- Tracking objects as points. In ECCV, pages 474–490. Springer, 2020.