Understanding and Improving Early Stopping for Learning with Noisy Labels (2106.15853v2)

Published 30 Jun 2021 in cs.LG

Abstract: The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stopping trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stopping point by considering a DNN as a whole. However, a DNN can be considered as a composition of a series of layers, and we find that the latter layers in a DNN are much more sensitive to label noise, while their former counterparts are quite robust. Therefore, selecting a stopping point for the whole network may make different DNN layers antagonistically affected each other, thus degrading the final performance. In this paper, we propose to separate a DNN into different parts and progressively train them to address this problem. Instead of the early stopping, which trains a whole DNN all at once, we initially train former DNN layers by optimizing the DNN with a relatively large number of epochs. During training, we progressively train the latter DNN layers by using a smaller number of epochs with the preceding layers fixed to counteract the impact of noisy labels. We term the proposed method as progressive early stopping (PES). Despite its simplicity, compared with the early stopping, PES can help to obtain more promising and stable results. Furthermore, by combining PES with existing approaches on noisy label training, we achieve state-of-the-art performance on image classification benchmarks.

Citations (175)

View on Semantic Scholar

Summary

The paper introduces Progressive Early Stopping (PES), decomposing training into staggered phases to mitigate the impact of noisy labels.
It optimizes network layers individually, allowing earlier layers to robustly learn from cleaner data while controlling noise exposure in later layers.
Experimental results on CIFAR datasets and Clothing-1M demonstrate significant accuracy gains over state-of-the-art approaches in high-noise scenarios.

Understanding and Improving Early Stopping for Learning with Noisy Labels

The paper "Understanding and Improving Early Stopping for Learning with Noisy Labels" presents an innovative approach to tackling the pervasive issue of label noise in training deep neural networks (DNNs). The research contributes to the field by introducing a nuanced methodology termed Progressive Early Stopping (PES), which proposes a paradigm shift in addressing noisy label challenges across DNN layers.

Key Contributions and Methodology

A central observation underpinning the research is the differential sensitivity of DNN layers to label noise. The authors identify that latter layers are more susceptible to label noise effects, adversely impacting overall model performance. Traditional early stopping approaches, which treat DNNs as holistic entities, potentially mask this differential sensitivity, resulting in suboptimal performance by prematurely halting the training of more noise-resistant early layers.

The PES method redefines the standard training pipeline by decomposing the network into multiple distinct parts, trained progressively. Initially, earlier layers undergo a longer training period, adapting to clean data patterns effectively. Subsequently, the latter layers are trained for fewer epochs, interspersed with reinitializations, while preceding layers remain fixed. This stepwise optimization aims to harness the memorization effect strategically, minimizing the noise impact on latter layers without compromising the learning of cleaner data patterns in earlier layers.

The strategy is evaluated using ResNet architectures on standard synthetic datasets such as CIFAR-10 and CIFAR-100, as well as the real-world noisy dataset Clothing-1M. The PES approach not only enhances the quality of confident example selection—a subset of training examples identified as likely clean—but also integrates seamlessly with semi-supervised learning frameworks like MixMatch to further improve network generalization.

Numerical Results and Implications

Experimental results illustrate significant improvements. PES achieves superior classification accuracy across a range of noisy label benchmarks. For instance, it outperformed state-of-the-art approaches like DivideMix by notably large margins, especially under high noise scenarios such as 45% pairflip noise on CIFAR-10. Performance gains also extend to tasks with instance-dependent noise and real-world noise, exemplified by the results on Clothing-1M.

These findings suggest that PES, with its distinctive handling of layer-specific sensitivities, informs a more robust learning process in the presence of label noise. It implies promising implications for the deployment of machine learning models in environments where label integrity cannot be entirely guaranteed.

Prospects for Future Developments

The progressive layered training paradigm introduced by PES opens avenues for further investigation into the inherent dynamics of noisy label learning. Future research could explore adaptive mechanisms that dynamically determine epoch allocations for layers based on intermediate performance metrics. Moreover, investigating the integration of PES with novel semi-supervised learning strategies could yield additional insights, potentially culminating in more adaptive and noise-agnostic learning frameworks.

In conclusion, this research submits a compelling solution to the challenges posed by noisy labels in deep learning through a strategic decomposition of DNN optimization processes. It underscores a forward-thinking step in addressing the nuanced interactions of deep networks with imperfect annotation data.

PDF Markdown