OVTrack: Open-Vocabulary Multiple Object Tracking (2304.08408v1)

Published 17 Apr 2023 in cs.CV

Abstract: The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, open-vocabulary MOT, that aims to evaluate tracking beyond pre-defined training categories. We further develop OVTrack, an open-vocabulary tracker that is capable of tracking arbitrary object classes. Its design is based on two key ingredients: First, leveraging vision-LLMs for both classification and association via knowledge distillation; second, a data hallucination strategy for robust appearance feature learning from denoising diffusion probabilistic models. The result is an extremely data-efficient open-vocabulary tracker that sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark, while being trained solely on static images. Project page: https://www.vis.xyz/pub/ovtrack/

Authors (6)

Siyuan Li (140 papers)
Tobias Fischer (125 papers)
Lei Ke (31 papers)
Henghui Ding (87 papers)
Martin Danelljan (96 papers)
Fisher Yu (104 papers)

Citations (36)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

OVTrack: Open-Vocabulary Multiple Object Tracking (2304.08408v1)

Summary

Related Papers