Intriguing Properties of Contrastive Losses (2011.02803v3)

Published 5 Nov 2020 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: We study three intriguing properties of contrastive learning. First, we generalize the standard contrastive loss to a broader family of losses, and we find that various instantiations of the generalized loss perform similarly under the presence of a multi-layer non-linear projection head. Second, we study if instance-based contrastive learning (with a global image representation) can learn well on images with multiple objects present. We find that meaningful hierarchical local features can be learned despite the fact that these objectives operate on global instance-level features. Finally, we study the phenomenon of feature suppression among competing features shared across augmented views, such as "color distribution" vs "object class". We construct datasets with explicit and controllable competing features, and show that, for contrastive learning, a few bits of easy-to-learn shared features can suppress, and even fully prevent, the learning of other sets of competing features. In scenarios where there are multiple objects in an image, the dominant object would suppress the learning of smaller objects. Existing contrastive learning methods critically rely on data augmentation to favor certain sets of features over others, and could suffer from learning saturation for scenarios where existing augmentations cannot fully address the feature suppression. This poses open challenges to existing contrastive learning techniques.

Citations (165)

View on Semantic Scholar

Summary

The paper introduces a generalized contrastive loss framework that unifies alignment and distribution matching to enhance unsupervised visual representation learning.
It shows that deeper projection heads reduce disparities among loss formulations, achieving competitive performance on CIFAR-10 and ImageNet.
The study identifies feature suppression challenges in instance-based learning, where dominant features can hinder the extraction of finer, useful details in complex images.

An Analysis of Contrastive Losses and Their Properties

Contrastive learning has emerged as a potent technique for unsupervised visual representation learning, exhibiting performance that rivals supervised approaches. The paper on "Intriguing Properties of Contrastive Losses" explores the nuanced mechanics of contrastive learning, unveiling pivotal insights about the generalized contrastive loss, feature suppression, and the capability of instance-based approaches to handle complex images.

Generalized Contrastive Loss

The paper introduces a generalized framework for contrastive loss, moving beyond the conventional cross-entropy based NT-Xent loss. This framework allows for a broader family of contrastive losses, characterized by an alignment term and a distribution matching term. The distribution matching leverages Sliced Wasserstein Distance (SWD) to support diverse prior distributions, circumventing the limitations of LogSumExp. The experimental findings suggest that with a multi-layer non-linear projection head, various instantiations of the generalized contrastive loss yield comparable results.

This is corroborated by the linear evaluation performance across CIFAR-10 and ImageNet, indicating that disparities among generalized contrastive losses diminish with a deeper projection head. Essentially, the deeper architecture mitigates discrepancies among different loss formulations, suggesting that the representation learned is robust to variations in loss.

Instance-Based Learning with Multiple Objects

Traditional contrastive learning methodologies operate at the instance level, encoding each image into a singular vector representation. This paper seeks to ascertain the efficacy of such methods in scenarios where images contain multiple objects. By constructing the MultiDigits dataset, the paper demonstrates that instance-based objectives can indeed discern useful features in images with numerous overlapping objects. The results from local feature clustering using K-means indicate that SimCLR and supervised learning extract meaningful hierarchical local features, even when trained to encode global representations.

Feature Suppression Phenomenon

Feature suppression emerges as a critical challenge, where easy-to-learn features overwhelm the learning of other salient features. Through datasets specifically designed to probe this phenomenon, the paper reveals that competing features introduce significant constraints on contrastive learning. In particular, controlled experiments show that dominant features can suppress the learning of subordinate ones, a challenge that extant data augmentations can only partially mitigate.

Furthermore, the analysis illustrates that a few bits of shared features can severely impair representation quality. For instance, augmenting RGB channels with extra channels of random bits leads to a dramatic fall in performance. The saturation effect observed indicates that contrastive learning struggles to extract useful representations beyond a few bits, posing open challenges for the method's scalability and robustness in diverse contexts.

Implications and Future Directions

The paper's findings underscore substantial implications for both theoretical exploration and practical application of contrastive learning. The generalized loss framework could pave the way for novel loss formulations and optimization strategies, enhancing the adaptability and precision of contrastive learning models. Additionally, insights into feature suppression could inform more effective data augmentation techniques or even the integration of generative models to circumvent saturation issues.

In future developments, addressing feature suppression will be paramount for advancing contrastive learning. More differentiated models capable of treating competing features equitably, perhaps by leveraging the success of generative models like VAEs, could present viable solutions. Expounding on the theoretical underpinnings, such as mutual information estimation, might also lend itself to tackling these challenges effectively.

In conclusion, this exploration of contrastive losses provides valuable perspectives on their formulation and application, expounding on both their triumphs and tribulations. The paper lays a groundwork for subsequent inquiries into model resilience and feature learning, helping steer the course for enhanced unsupervised learning methodologies.

PDF Markdown