Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking (2405.18606v1)

Published 28 May 2024 in cs.CV, cs.IT, and math.IT

Abstract: We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks as well as resolves track appearance-reappearance and occlusions. Moreover, this approach does not require detector retraining when cameras are reconfigured but only the camera matrices of reconfigured cameras need to be updated. Our approach is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter that utilizes all these functionalities is numerically intractable due to the exponentially growing number of terms in the (multi-object) filtering density, while existing approximations trade-off some of these functionalities for speed. To this end, we develop a more efficient approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, which improves data association and subsequently reduces the number of terms. Specifically, we exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density to realize the track initiation/termination and re-identification functionalities. Further, incorporating a tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes realizes the occlusion handling functionality of the filter. Evaluation of the proposed solution on challenging datasets demonstrates significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.

Citations (11)

View on Semantic Scholar

Summary

The paper presents a Bayesian multi-view filter that integrates dynamic and measurement models to enable efficient real-time 3D track initiation, termination, and re-identification.
The approach uses an adaptive birth model and occlusion handling based on bounding box overlaps to manage re-identification and maintain track continuity in cluttered environments.
Experimental results on benchmark datasets show significant improvements in tracking accuracy and resilience against camera reconfigurations.

Track Initialization and Re-Identification for 3D Multi-View Multi-Object Tracking

This paper addresses the challenges of 3D multi-object tracking (MOT) using 2D detections from monocular cameras to automatically initiate, terminate, and re-identify tracks while handling occlusions. The proposed solution integrates multi-object dynamic and measurement models into a Bayesian filtering framework for practical real-world tracking applications.

Proposed Solution Overview

The authors propose a 3D multi-view MOT (MV-MOT) solution that efficiently combines track-by-detection approaches using 2D monocular camera detections to achieve 3D tracking. The core methodology involves leveraging a Bayesian multi-object framework that performs automatic track initiation/termination, track re-identification, occlusion handling, and data association in a single Bayes filtering recursion. While computational intractability is a challenge due to the exponential complexity of exactly implementing such a filter, the authors present an approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, effectively reducing the number of computational terms needed.

Figure 1: Schematic of the proposed 3D MV-MOT solution. Multi-view detections (bounding boxes and visual features from all cameras) are supplied to the MV-MOT filter, integrating multi-object dynamic and measurement models to realize all MOT functionalities.

Bayesian Multi-View MOT Filter

The proposed filter uses a combination of geometric projections and adaptive models to handle occlusions and track re-identification effectively. The multi-view Bayesian tracking framework, employs numerical approximations of the GLMB filter, such that state estimation and track management is feasible in practice. The filter achieves linear complexity concerning the detection count across cameras, facilitating efficient online operation even when cameras are reconfigured without the need for detector retraining.

Implementation and Adaptive Models

Occlusion Handling

An innovative occlusion model accounts for partial and complete occlusions by evaluating the overlap of bounding boxes on camera image planes. Detection probability for each object is adjusted based on its occlusion score, drastically improving tracking in cluttered environments.

Figure 2: Schematic of the proposed multi-view MOT filter, showing the integration of Adaptive Birth Model and Occlusion Model for realizing MOT functionalities.

Track Initialization and Re-Identification

Track initialization and re-identification is achieved through an adaptive birth model that generates labels for newly appearing or reappearing objects using clustering techniques on sensor data. This approach not only initiates new tracks but also restores terminated ones based on visual feature similarity measures.

Figure 3: Illustration of detection probability differences correlating with track overlap and distance from the camera.

Adaptive Birth Model Parameters

Employing a statistical adaptive birth model, the filter estimates and updates model parameters online, ensuring new tracks are initialized accurately by analyzing feature vectors for similarity and recalling tentatively terminated tracks.

Experimental Results

The proposed MV-MOT filter was evaluated on datasets such as WILDTRACK and Curtin multi-camera (CMC) to demonstrate its robustness in various tracking scenarios. Results indicated substantial accuracy improvements and resilience to camera reconfigurations compared to existing solutions.

Figure 4: 3D ellipsoid estimates from the proposed MV-GLMB-AB filter utilizing CSTrack detection inputs, with projections on respective camera planes.

Impact and Future Directions

This research advances the capability of current MOT systems, particularly in deploying real-time monitoring systems without the need for exhaustive computational resources. The integration of appearance-reappearance resolution provides nuanced handling for complex tracking challenges. Future developments may focus on enhancing feature extraction from monocular inputs, improving computational efficiency through better optimization algorithms, and seamlessly incorporating other sensor modalities.

Conclusion

The paper presents a sophisticated approach for real-time 3D MV-MOT by efficiently integrating dynamic models and reducing approximation complexity. The insights obtained from this research open avenues for enhancing practical object tracking applications, sustaining robust performance in dynamic environments with frequent sensor configuration changes.