On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning (2306.04934v2)

Published 8 Jun 2023 in cs.CV

Abstract: Though Self-supervised learning (SSL) has been widely studied as a promising technique for representation learning, it doesn't generalize well on long-tailed datasets due to the majority classes dominating the feature space. Recent work shows that the long-tailed learning performance could be boosted by sampling extra in-domain (ID) data for self-supervised training, however, large-scale ID data which can rebalance the minority classes are expensive to collect. In this paper, we propose an alternative but easy-to-use and effective solution, Contrastive with Out-of-distribution (OOD) data for Long-Tail learning (COLT), which can effectively exploit OOD data to dynamically re-balance the feature space. We empirically identify the counter-intuitive usefulness of OOD samples in SSL long-tailed learning and principally design a novel SSL method. Concretely, we first localize the head' andtail' samples by assigning a tailness score to each OOD sample based on its neighborhoods in the feature space. Then, we propose an online OOD sampling strategy to dynamically re-balance the feature space. Finally, we enforce the model to be capable of distinguishing ID and OOD samples by a distribution-level supervised contrastive loss. Extensive experiments are conducted on various datasets and several state-of-the-art SSL frameworks to verify the effectiveness of the proposed method. The results show that our method significantly improves the performance of SSL on long-tailed datasets by a large margin, and even outperforms previous work which uses external ID data. Our code is available at https://github.com/JianhongBai/COLT.

Authors (7)

Jianhong Bai (14 papers)
Zuozhu Liu (79 papers)
Hualiang Wang (21 papers)
Jin Hao (8 papers)
Yang Feng (231 papers)
Huanpeng Chu (3 papers)
Haoji Hu (30 papers)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - JianhongBai/COLT: Official implementation of "On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning" (ICLR 2023) (15 stars)

On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning (2306.04934v2)

Summary

Related Papers

GitHub