Papers
Topics
Authors
Recent
2000 character limit reached

Exploring Localization for Self-supervised Fine-grained Contrastive Learning (2106.15788v4)

Published 30 Jun 2021 in cs.CV

Abstract: Self-supervised contrastive learning has demonstrated great potential in learning visual representations. Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. We point out that current contrastive methods are prone to memorizing background/foreground texture and therefore have a limitation in localizing the foreground object. Analysis suggests that learning to extract discriminative texture information and localization are equally crucial for fine-grained self-supervised pre-training. Based on our findings, we introduce cross-view saliency alignment (CVSA), a contrastive learning framework that first crops and swaps saliency regions of images as a novel view generation and then guides the model to localize on foreground objects via a cross-view alignment loss. Extensive experiments on both small- and large-scale fine-grained classification benchmarks show that CVSA significantly improves the learned representation.

Citations (8)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.