Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels (1805.07836v4)

Published 20 May 2018 in cs.LG, cs.CV, and stat.ML

Abstract: Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and challenging datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.

Authors (2)

Zhilu Zhang (33 papers)
Mert R. Sabuncu (87 papers)

Citations (2,335)

View on Semantic Scholar

Summary

The paper introduces a novel Generalized Cross Entropy loss that balances CCE and MAE to enhance robustness to noisy labels in DNN training.
Empirical results on CIFAR-10, CIFAR-100, and Fashion-MNIST demonstrate significant accuracy improvements under both uniform and class-dependent noise.
Theoretical analysis shows that controlling gradient behavior helps prevent overfitting, making the loss directly applicable to standard DNN architectures.

Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

The paper "Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels" addresses a crucial challenge in the training of deep neural networks (DNNs): the impact of noisy labels on model performance. The common categorical cross entropy (CCE) loss, while effective under clean data, is highly sensitive to label noise, leading to degraded performance. This paper proposes the Generalized Cross Entropy (GCE) loss and its truncated version as robust alternatives, applicable directly to existing DNN architectures with minimal modifications.

Contribution and Significance

The main contributions of the paper are twofold:

Introduction of Noise-Robust GCE Loss: The authors formalize a new loss function class, termed $\mathcal{L}_q$ , which generalizes both CCE and Mean Absolute Error (MAE), with $q \in (0,1]$ . The GCE loss balances between the robustness of MAE and the implicit weighting of CCE, facilitating efficient learning despite label noise.
Empirical Validation: Comprehensive experiments using CIFAR-10, CIFAR-100, and FASHION-MNIST datasets under various noise conditions validate the effectiveness of the proposed loss functions. The results demonstrate significant improvements in classification accuracy compared to traditional CCE and MAE, under both uniform and class-dependent noise scenarios.

Theoretical Framework

The paper provides a theoretical analysis underpinning the robustness of the proposed loss functions. Among the key points:

Noise Tolerance of $\mathcal{L}_q$ : The authors show that $\mathcal{L}_q$ is bounded and exhibits noise tolerance under uniform noise, provided $\eta < \frac{c-1}{c}$ .
Generalized Noise Conditions: For class-dependent noise, the noise tolerance holds if the risk under clean data is zero, confirming the robustness of the loss function across varied noise patterns.
Gradient Behavior: The gradient of $\mathcal{L}_q$ adapts the emphasis placed on samples, curtailing overfitting to noisy labels while maintaining learning efficiency. This alleviates the slow convergence and accuracy drop observed with MAE.

Experimental Results

The experiments laid out in the paper assess the robustness and efficacy of the $\mathcal{L}_q$ and truncated $\mathcal{L}_q$ loss functions. A concise overview includes:

Uniform Noise: The GCE loss outperforms CCE and MAE across various noise levels, mitigating the effects of noisy labels and achieving superior classification accuracy.
Class-Dependent Noise: Performance under class-dependent noise demonstrates that the truncated $\mathcal{L}_q$ loss achieves results comparable to methods relying on a known confusion matrix.
Open-Set Noise: When evaluated with a mixture of CIFAR-10 and CIFAR-100 datasets (the latter serving as open-set noise), the truncated GCE loss outperforms previous state-of-the-art methods, further affirming its robustness and applicability.

Practical Implications and Future Directions

The proposed loss functions present immediate practical benefits for training DNNs on noisy datasets, allowing for the employment of larger and more imperfectly labeled datasets without compromising model accuracy. This enhancement is particularly relevant in domains where clean, large-scale data annotation is infeasible.

The paper opens several avenues for future research, including:

Optimization of Hyperparameters: Further work could explore automated tuning mechanisms for the parameters $q$ and $k$ to adaptively balance noise robustness and learning dynamics.
Application to Real-World Noisy Datasets: Extending the experimental validation to a broader array of real-world datasets could provide additional insights into the practical utility of the proposed methods.
Integration with Advanced DNN Architectures: Examining the synergy between GCE loss and more recent DNN architectures, such as transformer models, could yield further advancements in robustness and performance.

Overall, the "Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels" provides a sound and implementable solution to a prevalent issue in deep learning, backed by rigorous theoretical and empirical analysis. The methods introduced promise to enhance the reliability and accuracy of DNN training in the presence of noisy labels, paving the way for more resilient AI applications.

PDF Markdown