What makes instance discrimination good for transfer learning? (2006.06606v2)

Published 11 Jun 2020 in cs.CV

Abstract: Contrastive visual pretraining based on the instance discrimination pretext task has made significant progress. Notably, recent work on unsupervised pretraining has shown to surpass the supervised counterpart for finetuning downstream applications such as object detection and segmentation. It comes as a surprise that image annotations would be better left unused for transfer learning. In this work, we investigate the following problems: What makes instance discrimination pretraining good for transfer learning? What knowledge is actually learned and transferred from these models? From this understanding of instance discrimination, how can we better exploit human annotation labels for pretraining? Our findings are threefold. First, what truly matters for the transfer is low-level and mid-level representations, not high-level representations. Second, the intra-category invariance enforced by the traditional supervised model weakens transferability by increasing task misalignment. Finally, supervised pretraining can be strengthened by following an exemplar-based approach without explicit constraints among the instances within the same category.

Authors (4)

Nanxuan Zhao (36 papers)
Zhirong Wu (31 papers)
Rynson W. H. Lau (54 papers)
Stephen Lin (72 papers)

Citations (164)

View on Semantic Scholar

Summary

A Study of Instance Discrimination in Transfer Learning

In the paper titled "What makes instance discrimination good for transfer learning?", the authors explore the efficacy of instance discrimination in unsupervised pretraining for transfer learning, particularly in comparison to conventional supervised methods. They aim to unravel why instance discrimination—an unsupervised technique—outperforms supervised pretraining when applied to disparate downstream visual tasks like object detection and segmentation.

Key Findings and Numerical Results

Low- and Mid-level Representation Transfer: The paper establishes that the transfer learning advantage of instance discrimination arises from the preservation of low- and mid-level representations rather than high-level semantic content. This insight is pivotal because it suggests that features relevant to fine-tuned tasks do not rely on high-level semantic alignment. This was empirically demonstrated as the authors noted minimal impact on transfer performance despite changes in high-level training data semantics.
Comparison with Supervised Pretraining: A significant observation was that supervised pretraining, which typically focuses on aligning high-level features for task transfer, might actually hinder the process due to task misalignment. Traditional supervised models often minimize intra-class variation, potentially overlooking essential unique features of instances—thus, dampening transfer effectiveness. This was evidenced by a higher susceptibility to localization errors in supervised models compared to their contrastive counterparts.
Strong Numerical Outcomes: Using momentum contrast (MoCo) for instance discrimination, unsupervised pretraining achieved an average precision (AP) of 46.6% on the PASCAL VOC object detection task, surpassing the supervised model's 42.4%. Additionally, annotated analysis using detection toolboxes revealed that unsupervised models are less prone to errors related to poor localization, accentuating better alignment with downstream tasks.
Improved Augmentation Strategy: The research underscores the effectiveness of augmentations like color jittering and random grayscaling, which benefit both supervised and unsupervised models. However, unsupervised models reap a more significant advantage, as their training hinges on image augmentation for creating invariances critical for transfer learning.

Implications and Speculations

This investigation into the nuances of instance discrimination furnishes valuable guidelines for optimizing transfer learning protocols. The insight that unsupervised methods naturally preserve broader image information can inform future architectures and learning paradigms. Specifically, the exemplar-based supervised learning approach proposed by the authors could encourage more nuanced use of categorical annotations, bridging the gap between supervised and unsupervised methodology advantages.

Looking forward, the conversation around minimizing intra-class variation warrants further exploration. Such discussions are essential in the broader context of developing artificial intelligence systems that need to balance specificity with generalization across different visual domains. Ultimately, this paper provides a comprehensive evaluation that could catalyze advancements in leveraging unsupervised learning for improved transferability, aligning it with real-world applications like few-shot image recognition and facial landmark prediction.

In summary, by elucidating the elements that make instance discrimination robust in transfer scenarios, this paper not only enhances the understanding of contrastive learning but also provides a pathway for refining supervised pretraining strategies to better align with diverse vision tasks. Such contributions are expected to have a lasting impact on both theoretical explorations and practical implementations in computer vision and machine learning.

PDF Markdown

What makes instance discrimination good for transfer learning? (2006.06606v2)

Summary

A Study of Instance Discrimination in Transfer Learning

Key Findings and Numerical Results

Implications and Speculations

Related Papers