Deep Cosine Metric Learning for Person Re-Identification (1812.00442v1)

Published 2 Dec 2018 in cs.CV and cs.LG

Abstract: Metric learning aims to construct an embedding where two extracted features corresponding to the same identity are likely to be closer than features from different identities. This paper presents a method for learning such a feature space where the cosine similarity is effectively optimized through a simple re-parametrization of the conventional softmax classification regime. At test time, the final classification layer can be stripped from the network to facilitate nearest neighbor queries on unseen individuals using the cosine similarity metric. This approach presents a simple alternative to direct metric learning objectives such as siamese networks that have required sophisticated pair or triplet sampling strategies in the past. The method is evaluated on two large-scale pedestrian re-identification datasets where competitive results are achieved overall. In particular, we achieve better generalization on the test set compared to a network trained with triplet loss.

Citations (337)

View on Semantic Scholar

Summary

The paper proposes a novel deep cosine metric learning method that integrates cosine similarity into a re-parametrized softmax classifier for person re-identification.
It achieves superior performance on Market-1501 and MARS datasets by improving mean average precision and rank accuracy compared to traditional triplet loss methods.
This streamlined framework eliminates complex sampling strategies, offering practical benefits for real-time surveillance and scalable image retrieval applications.

An Overview of Deep Cosine Metric Learning for Person Re-Identification

The paper "Deep Cosine Metric Learning for Person Re-Identification" presents a significant contribution to person re-identification (re-ID) through a novel approach that integrates metric learning with classification. Person re-identification is a challenging problem in video surveillance and computer vision, involving matching images of individuals captured by different cameras and from varying perspectives. The task is compounded by environmental changes, such as lighting and background variations.

Methodological Innovation

The authors propose a method that optimizes a feature space for cosine similarity through the re-parametrization of the softmax classifier. This approach mitigates traditional metric learning's complexities, circumventing the need for advanced sampling strategies inherent in direct metric learning frameworks like siamese and triplet networks. Here, the cosine similarity is embedded within the classification loss itself. During testing, the final classification layer is omitted, allowing the learned feature representation to perform nearest neighbor queries effectively.

The re-parametrization employs an $\ell_2$ normalization layer ensuring unit-length features, and re-normalizes weights to unit-length, thereby utilizing a cosine softmax loss for learning. The implementation aims to achieve compact clusters in the feature space aligning closely with the cosine similarity measure.

Evaluation and Dataset Performance

Empirical evaluations were conducted on the Market-1501 and MARS datasets, both staple benchmarks in the re-ID domain. The system demonstrated competitive performance, yielding superior generalization capabilities compared to triplet loss-trained networks. Specifically, it achieves considerable improvements in mean average precision (mAP) and rank-based accuracy metrics when benchmarked against existing literature.

Practical Implications and Future Directions

This innovative framework simplifies metric learning in person re-ID tasks by efficiently enforcing a similarity measure through a modified classification paradigm. Practically, it suggests a robust, alternative path to developing re-identification systems, especially beneficial in environments where expertly curated sample mining for metric losses is impractical.

Furthermore, while the research currently focuses on a streamlined neural architecture beneficial for real-time operations, future work could explore scaling this approach to deeper networks and multi-domain applicability. The framework's general nature indicates potential utility beyond person re-ID, for instance, in other image retrieval or matching problems across diverse datasets.

In the coming years, as AI methodologies continue to evolve, refining the intersection of metric learning and classification, as demonstrated in this paper, holds promise for pushing the boundaries of automated recognition systems, driving further advancements and applications in real-world scenarios.

PDF Markdown