Deep Fusion: Capturing Dependencies in Contrastive Learning via Transformer Projection Heads (2403.18681v2)

Published 27 Mar 2024 in cs.LG and cs.AI

Abstract: Contrastive Learning (CL) has emerged as a powerful method for training feature extraction models using unlabeled data. Recent studies suggest that incorporating a linear projection head post-backbone significantly enhances model performance. In this work, we investigate the use of a transformer model as a projection head within the CL framework, aiming to exploit the transformer's capacity for capturing long-range dependencies across embeddings to further improve performance. Our key contributions are fourfold: First, we introduce a novel application of transformers in the projection head role for contrastive learning, marking the first endeavor of its kind. Second, our experiments reveal a compelling "Deep Fusion" phenomenon where the attention mechanism progressively captures the correct relational dependencies among samples from the same class in deeper layers. Third, we provide a theoretical framework that explains and supports this "Deep Fusion" behavior. Finally, we demonstrate through experimental results that our model achieves superior performance compared to the existing approach of using a feed-forward layer.

References (31)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Deep Fusion: Capturing Dependencies in Contrastive Learning via Transformer Projection Heads (2403.18681v2)

Summary

Related Papers