Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 137 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation (2201.01297v1)

Published 4 Jan 2022 in cs.CV

Abstract: Occlusion between different objects is a typical challenge in Multi-Object Tracking (MOT), which often leads to inferior tracking results due to the missing detected objects. The common practice in multi-object tracking is re-identifying the missed objects after their reappearance. Though tracking performance can be boosted by the re-identification, the annotation of identity is required to train the model. In addition, such practice of re-identification still can not track those highly occluded objects when they are missed by the detector. In this paper, we focus on online multi-object tracking and design two novel modules, the unsupervised re-identification learning module and the occlusion estimation module, to handle these problems. Specifically, the proposed unsupervised re-identification learning module does not require any (pseudo) identity information nor suffer from the scalability issue. The proposed occlusion estimation module tries to predict the locations where occlusions happen, which are used to estimate the positions of missed objects by the detector. Our study shows that, when applied to state-of-the-art MOT methods, the proposed unsupervised re-identification learning is comparable to supervised re-identification learning, and the tracking performance is further improved by the proposed occlusion estimation module.

Citations (76)

Summary

  • The paper introduces an unsupervised Re-ID learning module that forms identity associations via intra- and inter-frame similarities without requiring labeled data.
  • It presents an occlusion estimation module that predicts overlapping regions to recover occluded objects, improving detection in crowded scenes.
  • Experiments on MOTChallenge datasets show significant improvements in tracking metrics, such as higher MOTA and IDF1 scores, validating the approach.

Introduction

The paper "Online Multi-Object Tracking with Unsupervised Re-Identification Learning and Occlusion Estimation" (2201.01297) presents two novel modules designed to enhance online multi-object tracking (MOT) systems. Addressing the inherent challenges associated with occlusions and re-identification (Re-ID), the paper introduces an unsupervised Re-ID learning module alongside an occlusion estimation module. These additions aim to reduce the dependency on annotated identity information and improve tracking performance by identifying occluded objects.

Unsupervised Re-Identification Learning Module

The unsupervised Re-ID learning module leverages the similarity in appearance between objects in adjacent video frames to build associations without requiring labeled identity information. This approach follows two key supervision signals:

  1. Strong Supervision: Objects within the same frame should not be associated with each other.
  2. Weak Supervision: Objects in adjacent frames are likely to share the same identity based on appearance.

The module uses a similarity matrix, SS, measuring cosine similarity between object features. A dynamic placeholder is introduced to the assignment matrix, M′M', handling cases where objects appear or disappear between frames. The learning is guided by intra-frame losses, inter-frame margin losses, and cycle consistency constraint losses to optimize the association matrix (Figure 1). Figure 1

Figure 1: The proposed un-supervised Re-ID learning method. Demonstrates the identities' assignment between adjacent frames without using explicit identity information.

Occlusion Estimation Module

Occlusions pose significant challenges in MOT, often leading to missed detections. The occlusion estimation module predicts the locations of possible occlusions using a key-point estimation approach, enabling the refinding of lost objects in subsequent frames (Figure 2). The module generates an occlusion heatmap by estimating the center of overlap between bounding boxes, which informs the tracking algorithm to recover occluded objects using predicted motion and occlusion centers. Figure 2

Figure 2: Typical occlusion cases. The translucent blue areas signify where occlusions occur, highlighted by red occlusion centers.

Implementation in Existing MOT Systems

These modules integrate seamlessly into existing tracking systems like FairMOT and CenterTrack. For FairMOT, the Re-ID learning mechanism is replaced by the unsupervised approach, and the occlusion estimation module is added alongside the detection head. The unsupervised Re-ID improves scalability and reduces reliance on labeled data, while the occlusion estimation enhances the system's ability to handle densely packed scenes by proactively identifying occluded objects (Figure 3). Figure 3

Figure 3: Application of the unsupervised Re-ID module and occlusion module to FairMOT.

Experimentation and Results

Extensive experimentation on the MOTChallenge datasets—including MOT16, MOT17, and MOT20—demonstrates the significant improvement in tracking metrics such as MOTA, IDF1, and IDS when integrating these modules. The results highlight lower false negatives and higher tracking accuracy by successfully refinding heavily occluded objects (Figure 4). Figure 4

Figure 4: Cases where lost objects are re-identified by the occlusion estimation module.

Conclusion

The paper contributes to enhancing MOT by introducing two key modules that address occlusions and Re-ID without explicit labeling requirements. The methodology presents a scalable approach, applicable to real-world tracking scenarios and adaptable to large-scale video data. Future developments could focus on further optimizing these modules to handle increasingly complex scenes.

This work also underscores the potential of unsupervised learning in tracking systems, suggesting a shift away from conventional supervised techniques reliant on extensive annotated datasets, thereby broadening applicability across varied domains.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.