Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 48 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

SCMM: Calibrating Cross-modal Representations for Text-Based Person Search (2304.02278v5)

Published 5 Apr 2023 in cs.CV

Abstract: Text-Based Person Search (TBPS) is a crucial task in the Internet of Things (IoT) domain that enables accurate retrieval of target individuals from large-scale galleries with only given textual caption. For cross-modal TBPS tasks, it is critical to obtain well-distributed representation in the common embedding space to reduce the inter-modal gap. Furthermore, learning detailed image-text correspondences is essential to discriminate similar targets and enable fine-grained search. To address these challenges, we present a simple yet effective method named Sew Calibration and Masked Modeling (SCMM) that calibrates cross-modal representations by learning compact and well-aligned embeddings. SCMM introduces two novel losses for fine-grained cross-modal representations: Sew calibration loss that aligns image and text features based on textual caption quality, and Masked Caption Modeling (MCM) loss that establishes detailed relationships between textual and visual parts. This dual-pronged strategy enhances feature alignment and cross-modal correspondences, enabling accurate distinction of similar individuals while maintaining a streamlined dual-encoder architecture for real-time inference, which is essential for resource-limited sensors and IoT systems. Extensive experiments on three popular TBPS benchmarks demonstrate the superiority of SCMM, achieving 73.81%, 64.25%, and 57.35% Rank-1 accuracy on CUHK-PEDES, ICFG-PEDES, and RSTPReID, respectively.

Citations (3)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.