Divide and Contrast: Self-supervised Learning from Uncurated Data

Published 17 May 2021 in cs.CV | (2105.08054v1)

Abstract: Self-supervised learning holds promise in leveraging large amounts of unlabeled data, however much of its progress has thus far been limited to highly curated pre-training data such as ImageNet. We explore the effects of contrastive learning from larger, less-curated image datasets such as YFCC, and find there is indeed a large difference in the resulting representation quality. We hypothesize that this curation gap is due to a shift in the distribution of image classes -- which is more diverse and heavy-tailed -- resulting in less relevant negative samples to learn from. We test this hypothesis with a new approach, Divide and Contrast (DnC), which alternates between contrastive learning and clustering-based hard negative mining. When pretrained on less curated datasets, DnC greatly improves the performance of self-supervised learning on downstream tasks, while remaining competitive with the current state-of-the-art on curated datasets.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (88)

View on Semantic Scholar

Summary

The paper introduces DnC, a method that integrates contrastive learning with clustering-based hard negative mining to tackle the curation gap in uncurated datasets.
It employs a three-stage process—base model training, clustering with expert models, and distillation—to merge global and subset-specific features.
Empirical results show up to a 3.2% improvement in Top-1 accuracy, demonstrating significant gains on uncurated datasets and diverse downstream tasks.

Insights into "Divide and Contrast: Self-supervised Learning from Uncurated Data"

The paper "Divide and Contrast: Self-supervised Learning from Uncurated Data" presents a sophisticated approach to enhancing self-supervised contrastive learning on uncurated datasets. The central issue addressed is the "curation gap," whereby the efficacy of self-supervised models suffers notably when trained on less-curated datasets such as YFCC100M, compared to highly curated ones like ImageNet.

The authors introduce Divide and Contrast (DnC), a method combining contrastive learning with clustering-based hard negative mining, to better handle the diverse and heavy-tailed nature of large uncurated datasets. This technique is shown to significantly improve performance on downstream tasks while maintaining competitive performance on curated datasets.

Key Methodological Contributions

The paper first highlights the limitations of existing self-supervised learning methods when applied to uncurated data, primarily attributing these limitations to the non-uniform distribution of negative samples. To address this, the authors hypothesize that clustering such datasets can recover subsets with local consistency, focusing learning on more relevant negative samples.

DnC operates in three sequential stages:

Base Model Training: A self-supervised model (MoCLR, an improved SimCLR) is trained on the entire dataset. This base model's embeddings serve as the foundation for clustering.
Clustering and Expert Training: The dataset is clustered using the base model embeddings, aiming to obtain subsets of semantically similar images. Expert models are then trained on each subset.
Distillation: The knowledge in the base model and expert models is distilled into a single model, allowing it to integrate both globally learned and subset-specific features.

Empirical Results

The empirical evidence provided in the paper is robust. DnC shows considerable improvement over MoCLR and BYOL on uncurated datasets like YFCC100M and JFT-300M, with gains of up to 3.2% in Top-1 accuracy on ImageNet linear evaluations compared to baseline methods. Furthermore, it demonstrates superior performance on diverse downstream tasks, including fine-grained classification datasets and tasks such as object detection and semantic segmentation.

When applied to curated datasets like ImageNet, albeit with minimal improvement over state-of-the-art models, DnC holds its ground, suggesting its broader applicability.

Implications and Future Directions

The implications of successfully applying self-supervised learning to uncurated data are far-reaching. The ability to harness vast amounts of uncurated data without requiring the exhaustive labeling necessary for curated datasets can significantly broaden the scope of AI applications, especially in domains where labeled data is scarce or costly to obtain.

Theoretically, the success of DnC suggests avenues for improving contrastive learning further by strategically leveraging clustering mechanisms. Future research could explore more adaptive clustering techniques, investigate other self-supervised paradigms in conjunction with DnC, or apply DnC to data modalities beyond images.

Beyond the approach itself, the clear evidence of a curation gap in self-supervised learning prompts a reevaluation of benchmarks traditionally used to assess these models, advocating for increased use of uncurated datasets to test self-supervised methods’ robustness.

In summary, "Divide and Contrast" makes a crucial contribution to the literature by demonstrating both the challenges and the potential solutions when extending self-supervised learning to uncurated data. This work serves as an important step toward creating more universally applicable and truly self-supervised models.

Markdown Report Issue