Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification (2405.17790v1)

Published 28 May 2024 in cs.CV

Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Scalable person re-identification: A benchmark. In CVPR, 2015.
  2. Person re-identification by camera correlation aware feature augmentation. TPAMI, 2017.
  3. Person transfer gan to bridge domain gap for person re-identification. In CVPR, 2018.
  4. Online pseudo label generation by hierarchical cluster dynamics for adaptive person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8371–8381, October 2021.
  5. Pose-guided representation learning for person re-identification. IEEE transactions on pattern analysis and machine intelligence, 44(2):622–635, 2019.
  6. Clothing status awareness for long-term person re-identification. In CVPR, 2021.
  7. Clothes-changing person re-identification with rgb modality only. In CVPR, 2022.
  8. Cloth-changing person re-identification from a single image with gait prediction and regularization. In CVPR, 2022.
  9. Semantic-guided pixel sampling for cloth-changing person re-identification. SPL, 2021.
  10. Fine-grained shape-appearance mutual learning for cloth-changing person re-identification. In CVPR, 2021.
  11. Cocas: A large-scale clothes changing person dataset for re-identification. In CVPR, 2020.
  12. Cocas+: Large-scale clothes-changing person re-identification with clothes templates. TCSVT, 2022.
  13. Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19366–19375, 2022.
  14. Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2843–2851, 2022.
  15. Learning with twin noisy labels for visible-infrared person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14308–14317, 2022.
  16. Fmcnet: Feature-level modality compensation for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7349–7358, 2022.
  17. Channel augmentation for visible-infrared re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(4):2299–2315, 2024.
  18. Rasa: Relation and sensitivity aware representation learning for text-based person search. arXiv preprint arXiv:2305.13653, 2023.
  19. Improving deep visual representation for person re-identification by global and local image-language association. In ECCV, 2018.
  20. Dual-path convolutional image-text embeddings with instance loss. TOMM, 2020.
  21. Person search with natural language description. In CVPR, 2017.
  22. Adversarial attribute-image person re-identification. arXiv preprint arXiv:1712.01493, 2017.
  23. Person re-identification meets image search. arXiv preprint arXiv:1502.02171, 2015.
  24. Person re-identification by contour sketch under moderate clothing change. TPAMI, 2019.
  25. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
  26. Pass: Part-aware self-supervised pre-training for person re-identification. In European Conference on Computer Vision, pages 198–214. Springer, 2022.
  27. Instruct-reid: A multi-purpose person re-identification task with instructions, 2023.
  28. Deep learning for person re-identification: A survey and outlook. TPAMI, 2021.
  29. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984, 2016.
  30. Person re-identification in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1367–1376, 2017.
  31. Svdnet for pedestrian retrieval. In Proceedings of the IEEE international conference on computer vision, pages 3800–3808, 2017.
  32. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV, 2017.
  33. Adversarially occluded samples for person re-identification. In CVPR, 2018.
  34. Re-identification with consistent attentive siamese networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5735–5744, 2019.
  35. Unsupervised tracklet person re-identification. IEEE transactions on pattern analysis and machine intelligence, 42(7):1770–1782, 2019.
  36. When does label smoothing help? Advances in neural information processing systems, 32, 2019.
  37. A siamese long short-term memory architecture for human re-identification. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pages 135–153. Springer, 2016.
  38. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 994–1003, 2018.
  39. Learning invariance from generated variance for unsupervised person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  40. A self-supervised gait encoding approach with locality-awareness for 3d skeleton based person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6649–6666, 2021.
  41. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017.
  42. Resource aware person re-identification across multiple resolutions. In CVPR, 2018.
  43. Embedding deep metric for person re-identification: A study against large variations. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 732–748. Springer, 2016.
  44. Feature completion for occluded person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4894–4912, 2021.
  45. Learning part-based convolutional features for person re-identification. IEEE transactions on pattern analysis and machine intelligence, 43(3):902–917, 2019.
  46. Learning transferable visual models from natural language supervision. In ICML, 2021.
  47. Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084, 2021.
  48. Coot: Cooperative hierarchical transformer for video-text representation learning. NeurIPS, 2020.
  49. Self-supervised multimodal versatile networks. NeurIPS, 2020.
  50. Hero: Hierarchical encoder for video+ language omni-representation pre-training. arXiv preprint arXiv:2005.00200, 2020.
  51. Less is more: Clipbert for video-and-language learning via sparse sampling. In CVPR, 2021.
  52. Vlm: Task-agnostic video-language model pre-training for video understanding. arXiv preprint arXiv:2105.09996, 2021.
  53. Bridgeformer: Bridging video-text retrieval with multiple choice questions. arXiv preprint arXiv:2201.04850, 2022.
  54. Univl: A unified video and language pre-training model for multimodal understanding and generation. arXiv preprint arXiv:2002.06353, 2020.
  55. Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6388–6397, 2020.
  56. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742, 2018.
  57. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
  58. Linchao Zhu and Yi Yang. Label independent memory for semi-supervised few-shot video classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):273–285, 2020.
  59. Memory-based cross-image contexts for weakly supervised semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 45(5):6006–6020, 2022.
  60. Bo Ji and Angela Yao. Multi-scale memory-based video deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1919–1928, 2022.
  61. Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In Proceedings of the 31st ACM International Conference on Multimedia, pages 4492–4501, 2023.
  62. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2153–2162, 2023.
  63. Large-scale spatio-temporal person re-identification: Algorithms and benchmark. IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4390–4403, 2021.
  64. Unified pre-training with pseudo texts for text-to-image person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11174–11184, 2023.
  65. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, 2014.
  66. When person re-identification meets changing clothes. In CVPR Workshops, 2020.
  67. Long-term cloth-changing person re-identification. In ACCV, 2020.
  68. A benchmark for clothes variation in person re-identification. International Journal of Intelligent Systems, 35(12):1881–1898, 2020.
  69. Long-term person re-identification with dramatic appearance change: Algorithm and benchmark. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6406–6415, 2022.
  70. Plip: Language-image pre-training for person representation learning. arXiv preprint arXiv:2305.08386, 2023.
  71. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023.
  72. Self-correction for human parsing. TPAMI, 2020.
  73. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  74. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021.
  75. Harmonious attention network for person re-identification. In CVPR, 2018.
  76. Relation-aware global attention for person re-identification. In CVPR, 2020.
  77. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV, 2018.
  78. Interaction-and-aggregation network for person re-identification. In CVPR, 2019.
  79. Transreid: Transformer-based object re-identification. In CVPR, 2021.
  80. Channel augmented joint learning for visible-infrared recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13567–13576, 2021.
  81. Towards a unified middle modality learning for visible-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia, pages 788–796, 2021.
  82. Learning semantic-aligned feature representation for text-based person search. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2724–2728. IEEE, 2022.
  83. Text-based person search with limited data. arXiv preprint arXiv:2110.10807, 2021.
  84. Semantics-aligned representation learning for person re-identification. In AAAI, 2020.
  85. Humanbench: Towards general human-centric perception with projector assisted pretraining. arXiv preprint arXiv:2303.05675, 2023.
  86. Beyond appearance: A semantic controllable self-supervised learning framework for human-centric visual tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15050–15061, June 2023.
  87. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, volume 139, pages 10347–10357, July 2021.
  88. Hap: Structure-aware masked image modeling for human-centric perception. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com