CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Published 18 Feb 2021 in cs.CV | (2102.09559v2)

Abstract: Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied. While existing semi-supervised learning (SSL) methods are known to perform poorly on minority classes, we find that they still generate high precision pseudo-labels on minority classes. By exploiting this property, in this work, we propose Class-Rebalancing Self-Training (CReST), a simple yet effective framework to improve existing SSL methods on class-imbalanced data. CReST iteratively retrains a baseline SSL model with a labeled set expanded by adding pseudo-labeled samples from an unlabeled set, where pseudo-labeled samples from minority classes are selected more frequently according to an estimated class distribution. We also propose a progressive distribution alignment to adaptively adjust the rebalancing strength dubbed CReST+. We show that CReST and CReST+ improve state-of-the-art SSL algorithms on various class-imbalanced datasets and consistently outperform other popular rebalancing methods. Code has been made available at https://github.com/google-research/crest.

Abstract PDF Upgrade to Chat

Citations (236)

View on Semantic Scholar

Summary

The paper introduces CReST, a novel framework that improves SSL under imbalanced data conditions by prioritizing minority class pseudo-labels.
It employs iterative retraining and a temperature-based scaling mechanism to progressively adjust rebalancing and enhance pseudo-label quality.
Empirical results show up to an 11.8% accuracy improvement, underscoring its effectiveness in boosting minority class recall in SSL.

A Formal Analysis of the CReST Framework for Imbalanced Semi-Supervised Learning

This essay provides a detailed analysis of the paper, "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning," authored by Chen Wei and collaborators during Wei's internship at Google. The paper introduces a methodological innovation for addressing the challenges inherent in semi-supervised learning (SSL) settings with class-imbalanced data distributions, a problem that has received disproportionately less attention compared to balanced scenarios.

Framework Introduction and Motivation

The authors propose the Class-Rebalancing Self-Training (CReST) framework to improve SSL performance on imbalanced datasets—a common, yet challenging scenario in real-world applications. SSL traditionally leverages large amounts of unlabeled data to enhance model generalization, primarily benefiting from the assumption that data classes are uniformly represented. However, existing methods falter notably on minority classes, delivering poor recall while often maintaining high precision on pseudo-labels—a phenomenon the authors exploit in the CReST framework.

CReST and CReST+ Methodologies

CReST operates by iterative retraining of baseline SSL models, extending the labeled dataset with pseudo-labeled samples from an unlabeled set. A key innovation lies in adjusting the selection frequency of these pseudo-labeled samples to favor minority classes, based on estimated class distributions. This approach contrasts starkly with traditional rebalancing strategies dependent on comprehensive label availability, highlighting CReST's clever utilization of high-precision pseudo-labels as a reliable heuristic for sample selection.

Furthermore, the authors extend CReST with a progressive distribution alignment strategy, referred to as CReST+. This enhancement adaptively adjusts the rebalancing strength with a focus on improving online pseudo-label quality. A temperature-based scaling mechanism is introduced to facilitate dynamic redistribution of class probabilities, progressively intensifying the rebalancing as training generations evolve.

Experimental Insights

Through rigorous empirical evaluations on datasets like CIFAR10-LT, CIFAR100-LT, and ImageNet127, the paper establishes that CReST significantly boosts the performance of state-of-the-art SSL methods. Notably, CReST achieves a remarkable up to 11.8% improvement in accuracy over competitors, particularly exceling in precision enhancement for minority classes by leveraging the iterative refinement of pseudo-label quality. Additionally, CReST+ further enhances these results, primarily through improved recall on minority classes, underscoring the strategic efficacy of progressive rebalancing.

Implications and Future Directions

The proposed CReST framework introduces a pragmatic technique for addressing imbalanced learning scenarios within SSL paradigms. In terms of practical implications, the framework offers a robust solution for applications where data imbalance is prevalent, such as in medical diagnostics or ecological monitoring. Future research could explore the integration of CReST with other modalities beyond image classification or adapt it towards different learning paradigms, such as active learning or transfer learning.

Conclusion

In conclusion, this paper contributes a valuable toolkit for the SSL community, addressing a significant gap in the handling of class imbalance. By adeptly integrating pseudo-label precision into its core methodology, CReST sets the stage for more balanced and representative learning outcomes in semi-supervised contexts. Researchers and practitioners stand to gain substantially from the adaptability and improvement metrics demonstrated by this framework in real-world, imbalanced datasets.

Markdown Report Issue