DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction (2403.02075v2)
Abstract: In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear motion, we propose a real-time diffusion-based MOT approach named DiffMOT. Specifically, for the motion predictor component, we propose a novel Decoupled Diffusion-based Motion Predictor (D$2$MP). It models the entire distribution of various motion presented by the data as a whole. It also predicts an individual object's motion conditioning on an individual's historical motion information. Furthermore, it optimizes the diffusion process with much fewer sampling steps. As a MOT tracker, the DiffMOT is real-time at 22.7FPS, and also outperforms the state-of-the-art on DanceTrack and SportsMOT datasets with $62.3\%$ and $76.2\%$ in HOTA metrics, respectively. To the best of our knowledge, DiffMOT is the first to introduce a diffusion probabilistic model into the MOT to tackle non-linear motion prediction.
- Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022.
- Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1–10, 2008.
- Simple online and realtime tracking. In ICIP, pages 3464–3468. IEEE, 2016.
- Observation-centric sort: Rethinking sort for robust multi-object tracking. In CVPR, pages 9686–9696, 2023.
- Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267, 2021.
- Sportsmot: A large multi-object tracking dataset in multiple sports scenes. In ICCV, pages 9921–9931, 2023.
- Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003, 2020.
- Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
- Genie: Higher-order denoising diffusion solvers. NeurIPS, 35:30150–30166, 2022.
- Score-based diffusion meets annealed importance sampling. NeurIPS, 35:21482–21494, 2022.
- Strongsort: Make deepsort great again. IEEE Transactions on Multimedia, 2023.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
- Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
- Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1):2249–2281, 2022.
- Decoupled diffusion models with explicit transition probability. arXiv preprint arXiv:2306.13720, 2023.
- Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
- Explicit visual prompting for low-level structure segmentations. In CVPR, pages 19434–19445, 2023a.
- Sparsetrack: Multi-object tracking by performing scene decomposition based on pseudo-depth. arXiv preprint arXiv:2306.05238, 2023b.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. NeurIPS, 35:5775–5787, 2022.
- Hota: A higher order metric for evaluating multi-object tracking. International journal of computer vision, 129(2):548–578, 2021.
- Diffusiontrack: Diffusion model for multi-object tracking. arXiv preprint arXiv:2308.09905, 2023.
- Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. arXiv preprint arXiv:2302.11813, 2023.
- Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016.
- Quasi-dense similarity learning for multiple object tracking. In CVPR, pages 164–173, 2021.
- Flexible style image super-resolution using conditional objective. IEEE Access, 10:9774–9792, 2022.
- Performance measures and a data set for multi-target, multi-camera tracking. In ECCV, pages 17–35. Springer, 2016.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- Thompson sampling efficiently learns to control diffusion processes. NeurIPS, 35:3871–3884, 2022.
- Denoising diffusion implicit models. ICLR, 2021.
- Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460, 2020.
- Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In CVPR, pages 20993–21002, 2022.
- Simple online and realtime tracking with a deep association metric. In ICIP, pages 3645–3649. IEEE, 2017.
- Track to detect and segment: An online multi-object tracker. In CVPR, pages 12352–12361, 2021.
- Motiontrack: Learning motion predictor for multiple object tracking. arXiv preprint arXiv:2306.02585, 2023.
- Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In WACV, pages 4799–4808, 2023.
- Motr: End-to-end multiple-object tracking with transformer. In ECCV, pages 659–675. Springer, 2022.
- Multiple object tracking by flowing and fusing. arXiv preprint arXiv:2001.11180, 2020.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, 129:3069–3087, 2021.
- Bytetrack: Multi-object tracking by associating every detection box. In ECCV, pages 1–21. Springer, 2022.
- Large scale image completion via co-modulated generative adversarial networks. ICLR, 2021.
- Tracking objects as points. In ECCV, pages 474–490. Springer, 2020.
- Global tracking transformers. In CVPR, pages 8771–8780, 2022.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.