Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prototype-based Cross-Modal Object Tracking (2312.14471v1)

Published 22 Dec 2023 in cs.CV

Abstract: Cross-modal object tracking is an important research topic in the field of information fusion, and it aims to address imaging limitations in challenging scenarios by integrating switchable visible and near-infrared modalities. However, existing tracking methods face some difficulties in adapting to significant target appearance variations in the presence of modality switch. For instance, model update based tracking methods struggle to maintain stable tracking results during modality switching, leading to error accumulation and model drift. Template based tracking methods solely rely on the template information from first frame and/or last frame, which lacks sufficient representation ability and poses challenges in handling significant target appearance changes. To address this problem, we propose a prototype-based cross-modal object tracker called ProtoTrack, which introduces a novel prototype learning scheme to adapt to significant target appearance variations, for cross-modal object tracking. In particular, we design a multi-modal prototype to represent target information by multi-kind samples, including a fixed sample from the first frame and two representative samples from different modalities. Moreover, we develop a prototype generation algorithm based on two new modules to ensure the prototype representative in different challenges......

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. S. Liu, S. Huang, S. Wang, K. Muhammad, P. Bellavista, and J. Del Ser, “Visual tracking in complex scenes: A location fusion mechanism based on the combination of multiple visual cognition flows,” Information Fusion, vol. 96, pp. 281–296, 2023.
  2. M.-x. Jiang, C. Deng, J.-s. Shan, Y.-y. Wang, Y.-j. Jia, and X. Sun, “Hierarchical multi-modal fusion fcn with attention model for rgb-d tracking,” Information Fusion, vol. 50, pp. 1–8, 2019.
  3. Z. Tang, T. Xu, H. Li, X.-J. Wu, X. Zhu, and J. Kittler, “Exploring fusion strategies for accurate rgbt visual object tracking,” Information Fusion, p. 101881, 2023.
  4. X. Zhang, P. Ye, H. Leung, K. Gong, and G. Xiao, “Object fusion tracking based on visible and infrared images: A comprehensive review,” Information Fusion, vol. 63, pp. 166–187, 2020.
  5. P. K. Mishra and G. Saroha, “A study on video surveillance system for object detection and tracking,” in International Conference on Computing for Sustainable Global Development.   IEEE, 2016, pp. 221–226.
  6. S. Liu, S. Huang, X. Xu, J. Lloret, and K. Muhammad, “Efficient visual tracking based on fuzzy inference for intelligent transportation systems,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  7. M. Gao, L. Jin, Y. Jiang, and B. Guo, “Manifold siamese network: A novel visual tracking convnet for autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 4, pp. 1612–1623, 2019.
  8. S. Song and J. Xiao, “Tracking revisited using rgbd camera: Unified benchmark and baselines,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 233–240.
  9. C. Li, H. Cheng, S. Hu, X. Liu, J. Tang, and L. Lin, “Learning collaborative sparse representation for grayscale-thermal tracking,” IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5743–5756, 2016.
  10. C. Li, N. Zhao, Y. Lu, C. Zhu, and J. Tang, “Weighted sparse representation regularized graph learning for rgb-t object tracking,” in Proceedings of the ACM International Conference on Multimedia, 2017, pp. 1856–1864.
  11. C. Li, X. Liang, Y. Lu, N. Zhao, and J. Tang, “Rgb-t object tracking: benchmark and baseline,” Pattern Recognition, vol. 96, p. 106977, 2019.
  12. C. Li, W. Xue, Y. Jia, Z. Qu, B. Luo, J. Tang, and D. Sun, “Lasher: A large-scale high-diversity benchmark for rgbt tracking,” IEEE Transactions on Image Processing, 2021.
  13. P. Zhang, J. Zhao, D. Wang, H. Lu, and X. Ruan, “Visible-thermal uav tracking: A large-scale benchmark and new baseline,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8886–8895.
  14. C. Li, T. Zhu, L. Liu, X. Si, Z. Fan, and S. Zhai, “Cross-modal object tracking: Modality-aware representations and a unified benchmark,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1289–1296.
  15. H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4293–4302.
  16. K. Dai, Y. Zhang, D. Wang, J. Li, H. Lu, and X. Yang, “High-performance long-term tracking with meta-updater,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 6298–6307.
  17. I. Jung, J. Son, M. Baek, and B. Han, “Real-time mdnet,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 83–98.
  18. M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg, “Atom: Accurate tracking by overlap maximization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4660–4669.
  19. G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte, “Learning discriminative model prediction for tracking,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6182–6191.
  20. M. Danelljan, L. V. Gool, and R. Timofte, “Probabilistic regression for visual tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 7183–7192.
  21. C. Mayer, M. Danelljan, G. Bhat, M. Paul, D. P. Paudel, F. Yu, and L. Van Gool, “Transforming model prediction for tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8731–8740.
  22. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “Fully-convolutional siamese networks for object tracking,” in Proceedings of the European Conference on Computer Vision.   Springer, 2016, pp. 850–865.
  23. Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast online object tracking and segmentation: A unifying approach,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1328–1338.
  24. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4282–4291.
  25. Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, “Siamese box adaptive network for visual tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 6668–6677.
  26. L. Zhang, A. Gonzalez-Garcia, J. V. D. Weijer, M. Danelljan, and F. S. Khan, “Learning the model update for siamese trackers,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 4010–4019.
  27. N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for robust visual tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 1571–1580.
  28. T. Yang and A. B. Chan, “Learning dynamic memory networks for object tracking,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 152–167.
  29. H.-M. Yang, X.-Y. Zhang, F. Yin, and C.-L. Liu, “Robust classification with convolutional prototype learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3474–3482.
  30. G. Li, V. Jampani, L. Sevilla-Lara, D. Sun, J. Kim, and J. Kim, “Adaptive prototype learning and allocation for few-shot segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8334–8343.
  31. B. Yang, F. Wan, C. Liu, B. Li, X. Ji, and Q. Ye, “Part-based semantic transform for few-shot semantic segmentation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7141–7152, 2021.
  32. Y. Liu, N. Liu, X. Yao, and J. Han, “Intermediate prototype mining transformer for few-shot semantic segmentation,” Advances in Neural Information Processing Systems, vol. 35, pp. 38 020–38 031, 2022.
  33. X. Lu, W. Diao, Y. Mao, J. Li, P. Wang, X. Sun, and K. Fu, “Breaking immutable: Information-coupled prototype elaboration for few-shot object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 1844–1852.
  34. P. Cheng, L. Lin, J. Lyu, Y. Huang, W. Luo, and X. Tang, “Prior: Prototype representation joint learning from medical images and reports,” in Proceedings of the IEEE International Conference on Computer Vision, 2023, pp. 21 361–21 371.
  35. B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 448–10 457.
  36. B. Ye, H. Chang, B. Ma, S. Shan, and X. Chen, “Joint feature learning and relation modeling for tracking: A one-stream framework,” in Proceedings of the European Conference on Computer Vision.   Springer, 2022, pp. 341–357.
  37. T. Xu, X.-F. Zhu, and X.-J. Wu, “Learning spatio-temporal discriminative model for affine subspace based visual object tracking,” Visual Intelligence, vol. 1, no. 1, p. 4, 2023.
  38. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  39. K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “Panet: Few-shot image semantic segmentation with prototype alignment,” in proceedings of the CVF International Conference on Computer Vision, 2019, pp. 9197–9206.
  40. W. Liu, C. Zhang, G. Lin, and F. Liu, “Crnet: Cross-reference networks for few-shot segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 4165–4173.
  41. C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5217–5226.
  42. X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, and L. Lin, “Meta r-cnn: Towards general solver for instance-level low-shot learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9577–9586.
  43. A. Schumann, S. Gong, and T. Schuchert, “Deep learning prototype domains for person re-identification,” in Proceedings of the IEEE International Conference on Image Processing.   IEEE, 2017, pp. 1767–1771.
  44. H. Rao and C. Miao, “Transg: Transformer-based skeleton graph prototype contrastive learning with structure-trajectory prompted reconstruction for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 118–22 128.
  45. L. Tan, P. Dai, R. Ji, and Y. Wu, “Dynamic prototype mask for occluded person re-identification,” in Proceedings of the ACM International Conference on Multimedia, 2022, pp. 531–540.
  46. Y. Li, J. He, T. Zhang, X. Liu, Y. Zhang, and F. Wu, “Diverse part discovery: Occluded person re-identification with part-aware transformer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2898–2907.
  47. Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iterative mixed attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 608–13 618.
  48. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” Proceedings of the International Conference on Learning Representations, 2021.
  49. B. Cheng, Y. Wei, H. Shi, R. Feris, J. Xiong, and T. Huang, “Revisiting rcnn: On awakening the classification power of faster rcnn,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 453–468.
  50. G. Song, Y. Liu, and X. Wang, “Revisiting the sibling head in object detector,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 563–11 572.
  51. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 658–666.
  52. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proceedings of the European Conference on Computer Vision.   Springer, 2020, pp. 213–229.
  53. H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, H. Bai, Y. Xu, C. Liao, and H. Ling, “Lasot: A high-quality benchmark for large-scale single object tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5374–5383.
  54. L. Huang, X. Zhao, and K. Huang, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2019.
  55. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proceedings of the European Conference on Computer Vision.   Springer, 2014, pp. 740–755.
  56. M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem, “Trackingnet: A large-scale dataset and benchmark for object tracking in the wild,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 300–317.
  57. J. Choi, H. J. Chang, T. Fischer, S. Yun, K. Lee, J. Jeong, Y. Demiris, and J. Y. Choi, “Context-aware deep feature compression for high-speed visual tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 479–488.
  58. M. Danelljan, A. Robinson, F. Shahbaz Khan, and M. Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” in Proceedings of the European Conference on Computer Vision.   Springer, 2016, pp. 472–488.
  59. H. Kiani Galoogahi, A. Fagg, and S. Lucey, “Learning background-aware correlation filters for visual tracking,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1135–1143.
  60. L. Huang, X. Zhao, and K. Huang, “Globaltrack: A simple and strong baseline for long-term tracking,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 037–11 044.
  61. Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, “Ocean: Object-aware anchor-free tracking,” in Proceedings of the European Conference on Computer Vision.   Springer, 2020, pp. 771–787.
  62. X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8126–8135.

Summary

We haven't generated a summary for this paper yet.