Learning Discriminative Model Prediction for Tracking (1904.07220v2)

Published 15 Apr 2019 in cs.CV

Abstract: The current strive towards end-to-end trainable computer vision systems imposes major challenges for the task of visual tracking. In contrast to most other vision problems, tracking requires the learning of a robust target-specific appearance model online, during the inference stage. To be end-to-end trainable, the online learning of the target model thus needs to be embedded in the tracking architecture itself. Due to the imposed challenges, the popular Siamese paradigm simply predicts a target feature template, while ignoring the background appearance information during inference. Consequently, the predicted model possesses limited target-background discriminability. We develop an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction. Our architecture is derived from a discriminative learning loss by designing a dedicated optimization process that is capable of predicting a powerful model in only a few iterations. Furthermore, our approach is able to learn key aspects of the discriminative loss itself. The proposed tracker sets a new state-of-the-art on 6 tracking benchmarks, achieving an EAO score of 0.440 on VOT2018, while running at over 40 FPS. The code and models are available at https://github.com/visionml/pytracking.

Authors (4)

Goutam Bhat (16 papers)
Martin Danelljan (96 papers)
Luc Van Gool (570 papers)
Radu Timofte (299 papers)

Citations (961)

View on Semantic Scholar

Summary

The paper introduces a novel end-to-end tracker that incorporates target and background information through a discriminative loss for robust tracking.
It formulates model prediction as an optimization solved by an initializer network and a steepest descent-based module for rapid convergence.
The proposed DiMP-50 model achieves state-of-art scores on multiple benchmarks, demonstrating strong generalization and effective discrimination.

Learning Discriminative Model Prediction for Tracking

In "Learning Discriminative Model Prediction for Tracking," Bhat et al. address the challenging problem of visual object tracking, focusing on the difficulty of robustly distinguishing target objects from the background. The paper presents a novel end-to-end trainable tracking architecture that effectively predicts a target model by leveraging both the target and the surrounding background information.

The core innovation of this work lies in its discriminative learning foundation, designed to overcome the limitations of the popular Siamese tracking paradigm. Unlike traditional Siamese approaches that typically rely only on the target's appearance, the proposed method integrates background appearance information during inference, enhancing the discriminative power of the target model.

Methodology

The proposed architecture comprises several key components designed collaboratively:

Discriminative Learning Loss: The authors formulate a loss function incorporating spatially varying weights and a hinge-like structure, accommodating data imbalance between target and background samples. This flexible loss function, learned during training, aims to optimize the discriminative abilities of the model by minimizing errors associated with both target and background classifications.
Model Prediction Architecture: The target model prediction is framed as an optimization problem. The architecture features an initializer network and a steepest descent-based optimizer module. The initializer provides a rough estimate of the target model, which is then refined through the optimizer by utilizing first-order and second-order information to achieve rapid convergence.
End-to-End Training: The entire tracking framework, including a backbone feature extractor, is trained in an end-to-end manner. By employing a novel set-based training scheme, the model effectively learns to generalize to unseen frames and sequences.

Experimental Evaluation

The paper provides an extensive evaluation across several established tracking benchmarks, including VOT2018, LaSOT, TrackingNet, GOT10k, NFS, OTB-100, and UAV123. The results demonstrate that the proposed approach, specifically DiMP-50 utilizing a ResNet-50 backbone, achieves state-of-the-art performance:

On VOT2018, DiMP-50 achieves an EAO score of 0.440, outperforming previous methods such as SiamRPN++ and ATOM.
On LaSOT, DiMP-50 achieves an AUC score of 56.9%, showing significant improvement over the previous best results.
On TrackingNet, DiMP-50 records an AUC score of 74.0%, surpassing SiamRPN++.
On GOT10k, DiMP-50 achieves a remarkable AO score of 61.1%, underscoring its strong generalization capabilities.

Implications and Future Work

The findings of this paper have substantial implications for practical tracking applications and theoretical advancements in online learning models. The superiority of discriminative learning methods in tracking suggests that future research could further explore robust loss functions and optimization techniques that can be seamlessly integrated into end-to-end frameworks. Furthermore, the impressive performance on diverse datasets highlights the promise of these methods in real-world scenarios where trackers must adaptively distinguish between targets and complex backgrounds.

Additionally, the demonstrated ability of the model prediction architecture to generalize to unseen objects hints at future possibilities in few-shot and zero-shot learning paradigms within the tracking domain. Researchers may also investigate the impact of alternative architectures for the backbone feature extractor and explore more sophisticated data augmentation techniques to boost tracking robustness even further.

In conclusion, "Learning Discriminative Model Prediction for Tracking" sets a new benchmark in the field of visual tracking by emphasizing the importance of integrating target and background information within an end-to-end learning framework. The proposed approach offers a robust, adaptable, and efficient solution, potentially guiding future research directions toward even more sophisticated and effective tracking systems.

PDF Markdown

Related Papers

GitHub

GitHub - visionml/pytracking: Visual tracking library based on PyTorch. (3,246 stars)