Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NetTrack: Tracking Highly Dynamic Objects with a Net (2403.11186v1)

Published 17 Mar 2024 in cs.CV

Abstract: The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To address this problem, this work proposes NetTrack, an efficient, generic, and affordable tracking framework to introduce fine-grained learning that is robust to dynamicity. Specifically, NetTrack constructs a dynamicity-aware association with a fine-grained Net, leveraging point-level visual cues. Correspondingly, a fine-grained sampler and matching method have been incorporated. Furthermore, NetTrack learns object-text correspondence for fine-grained localization. To evaluate MOT in extremely dynamic open-world scenarios, a bird flock tracking (BFT) dataset is constructed, which exhibits high dynamicity with diverse species and open-world scenarios. Comprehensive evaluation on BFT validates the effectiveness of fine-grained learning on object dynamicity, and thorough transfer experiments on challenging open-world benchmarks, i.e., TAO, TAO-OW, AnimalTrack, and GMOT-40, validate the strong generalization ability of NetTrack even without finetuning. Project page: https://george-zhuang.github.io/nettrack/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. GMOT-40: A Benchmark for Generic Multiple Object Tracking. In CVPR, pages 6719–6728, 2021.
  2. Tracking without Bells and Whistles. In ICCV, pages 941–951, 2019.
  3. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. EURASIP J. Image Video Process., 2008:1–10, 2008.
  4. Simple Online and Realtime Tracking. In ICIP, pages 3464–3468, 2016.
  5. High-Speed Tracking-by-Detection without Using Image Information. In AVSS, pages 1–6, 2017.
  6. Language Models Are Few-Shot Learners. In NeurIPS, pages 1877–1901, 2020.
  7. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. In CVPR, pages 9686–9696, 2023.
  8. TAO: A Large-Scale Benchmark for Tracking Any Object. In ECCV, pages 436–454, 2020.
  9. MOT20: A Benchmark for Multi Object Tracking in Crowded Scenes. arXiv preprint arXiv:2003.09003, 2020.
  10. MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking. IJCV, 129:845–881, 2021.
  11. TAP-Vid: A Benchmark for Tracking Any Point in a Video. In NeurIPS, pages 13610–13626, 2022.
  12. TAPIR: Tracking Any Point with Per-frame Initialization and Temporal Refinement. In ICCV, pages 1–19, 2023.
  13. 1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking. arXiv preprint arXiv:2101.08040, 2021.
  14. StrongSORT: Make DeepSORT Great Again. IEEE TMM, 2023.
  15. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In CVPR, pages 5374–5383, 2019.
  16. The Aerodynamics of Free-Flight Maneuvers in Drosophila. Science, 300(5618):495–498, 2003.
  17. YOLOX: Exceeding YOLO Series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  18. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In CVPR, pages 3354–3361, 2012.
  19. Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation. In ICLR, pages 1–21, 2021.
  20. LVIS: A Dataset for Large Vocabulary Instance Segmentation. In CVPR, pages 5356–5364, 2019.
  21. Contrastive Learning for Weakly Supervised Phrase Grounding. In ECCV, pages 752–768, 2020.
  22. Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories. In ECCV, pages 59–75, 2022.
  23. Video OWL-ViT: Temporally-Consistent Open-World Localization in Video. In ICCV, pages 13802–13811, 2023.
  24. YOLOv5 SOTA Realtime Instance Segmentation.
  25. Earthflight. https://www.bbc.co.uk/programmes/b018xsc1. BBC, 2011.
  26. Trackeval. https://github.com/JonathonLuiten/TrackEval, 2020.
  27. R. Kalman. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng., 82:35–45, 1960.
  28. CoTracker: It is Better to Track Together. arXiv preprint arXiv:2307.07635, 2023.
  29. Harold W Kuhn. The Hungarian Method for the Assignment Problem. Nav. Res. Logist., 2(1-2):83–97, 1955.
  30. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv preprint arXiv:1504.01942, 2015.
  31. Visual-Inertial Hand Motion Tracking with Robustness against Occlusion, Interference, and Contact. Sci. Robot., 6(58):eabe1315, 2021.
  32. Grounded Language-Image Pre-Training. In CVPR, pages 10965–10975, 2022a.
  33. Tracking Every Thing in the Wild. In ECCV, pages 498–515, 2022b.
  34. OVTrack: Open-Vocabulary Multiple Object Tracking. In CVPR, pages 5567–5577, 2023.
  35. Rethinking the Competition between Detection and ReID in Multiobject Tracking. IEEE TIP, 31:3182–3196, 2022.
  36. Microsoft COCO: Common Objects in Context. In ECCV, pages 740–755, 2014.
  37. Simultaneous Measurements of Three-Dimensional Trajectories and Wingbeat Frequencies of Birds in the Field. J. R. Soc. Interface, 15(147):20180653, 2018.
  38. Costs and Benefits of Social Relationships in the Collective Motion of Bird Flocks. Nat. Ecol. Evol., 3(6):943–948, 2019a.
  39. Behavioural Plasticity and the Transition to Order in Jackdaw Flocks. Nat. Commun., 10(1):5174, 2019b.
  40. A Survey on Edge Computing Systems and Tools. Proc. IEEE, 107(8):1537–1562, 2019.
  41. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. arXiv preprint arXiv:2303.05499, 2023.
  42. Opening Up Open World Tracking. In CVPR, pages 19045–19055, 2022.
  43. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In ICCV, pages 10012–10022, 2021.
  44. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking. IJCV, 129:548–578, 2021.
  45. Multiple Object Tracking: A Literature Review. Artif. Intell., 293:103448, 2021.
  46. Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification. In ICIP, pages 1–5, 2023.
  47. Dynamic Robotic Tracking of Underwater Targets Using Reinforcement Learning. Sci. Robot., 8(80):eade7811, 2023.
  48. TrackFormer: Multi-Object Tracking with Transformers. In CVPR, pages 8844–8854, 2022.
  49. MOT16: A Benchmark for Multi-Object Tracking. arXiv preprint arXiv:1603.00831, 2016.
  50. Simple Open-Vocabulary Object Detection. In ECCV, pages 728–755, 2022.
  51. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In ECCV, pages 300–317, 2018.
  52. OpenAI. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774, pages 1–100, 2023.
  53. Quasi-Dense Similarity Learning for Multiple Object Tracking. In CVPR, pages 164–173, 2021.
  54. Learning Transferable Visual Models from Natural Language Supervision. In ICML, pages 8748–8763, 2021.
  55. YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767, 2018.
  56. Performance Measures and a Dataset for Multi-Target, Multi-Camera Tracking. In ECCV, pages 17–35, 2016.
  57. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR, pages 10684–10695, 2022.
  58. Objects365: A Large-Scale, High-Quality Dataset for Object Detection. In ICCV, pages 8430–8439, 2019.
  59. TransTrack: Multiple Object Tracking with Transformer. arXiv preprint arXiv:2012.15460, pages 1–11, 2020a.
  60. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In CVPR, pages 2446–2454, 2020b.
  61. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In CVPR, pages 2446–2454, 2020c.
  62. Importance Sampling: A Review. Wiley Interdiscip. Rev. Comput. Stat., 2(1):54–60, 2010.
  63. Tracking Everything Everywhere All at Once. In ICCV, pages 1–15, 2023.
  64. Towards Real-Time Multi-Object Tracking. In ECCV, pages 107–122, 2020.
  65. Simple Online and Realtime Tracking with A Deep Association Mmetric. In ICIP, pages 3645–3649, 2017.
  66. TransCenter: Transformers With Dense Representations for Multiple-Object Tracking. IEEE TPAMI, pages 1–16, 2022.
  67. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In CVPR, pages 2636–2645, 2020.
  68. GLIPv2: Unifying Localization and Vision-Language Understanding. In NeurIPS, pages 36067–36080, 2022a.
  69. AnimalTrack: A Benchmark for Multi-Animal Tracking in the Wild. IJCV, 131(2):496–513, 2023.
  70. FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking. IJCV, 129:3069–3087, 2021.
  71. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In ECCV, pages 1–21, 2022b.
  72. Tracking Objects as Pixel-Wise Distributions. In ECCV, pages 76–94, 2022.
  73. RegionCLIP: Region-Based Language-Image Pretraining. In CVPR, pages 16793–16803, 2022.
  74. Objects as Points. arXiv preprint arXiv:1904.07850, 2019.
  75. Tracking Objects as Points. In ECCV, pages 474–490, 2020.
  76. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv preprint arXiv:2010.04159, 2020.
Citations (8)

Summary

We haven't generated a summary for this paper yet.