Learning Deep Networks from Noisy Labels with Dropout Regularization (1705.03419v1)

Published 9 May 2017 in cs.CV, cs.LG, and stat.ML

Abstract: Large datasets often have unreliable labels-such as those obtained from Amazon's Mechanical Turk or social media platforms-and classifiers trained on mislabeled datasets often exhibit poor performance. We present a simple, effective technique for accounting for label noise when training deep neural networks. We augment a standard deep network with a softmax layer that models the label noise statistics. Then, we train the deep network and noise model jointly via end-to-end stochastic gradient descent on the (perhaps mislabeled) dataset. The augmented model is overdetermined, so in order to encourage the learning of a non-trivial noise model, we apply dropout regularization to the weights of the noise model during training. Numerical experiments on noisy versions of the CIFAR-10 and MNIST datasets show that the proposed dropout technique outperforms state-of-the-art methods.

Citations (177)

View on Semantic Scholar

Summary

The paper introduces a dropout-regularized softmax noise model to effectively train deep networks with mislabeled data.
It demonstrates enhanced classification accuracy on benchmarks such as CIFAR-10 and MNIST under various noise conditions.
The approach prevents overfitting to erroneous labels, promoting reliable feature learning and robust image clustering.

Learning Deep Networks from Noisy Labels with Dropout Regularization

In the context of deep learning, label noise presents a significant hindrance to the performance of classifiers, particularly when dealing with large datasets sourced from non-expert platforms such as Amazon's Mechanical Turk. The paper "Learning Deep Networks from Noisy Labels with Dropout Regularization" by Ishan Jindal, Matthew Nokleby, and Xuewen Chen addresses this issue by presenting a novel technique to train deep neural networks (DNNs) effectively despite the presence of mislabeled data.

The authors propose enhancing a standard deep network architecture with a softmax layer aimed at modeling label noise statistics. The model is trained end-to-end, optimizing both the deep network parameters and the label noise model simultaneously via stochastic gradient descent (SGD). Key to the proposed technique is the application of dropout regularization to the softmax noise model, which encourages the learning of a robust noise model by preventing the network from directly fitting to the noisy labels.

Methodology and Results

The paper introduces a probabilistic model of label noise, defined by a column-stochastic matrix representing the probability of label flips. In their experiments, they utilize uniform and non-uniform noise models, applying the technique on standard image datasets such as CIFAR-10 and MNIST. The dropout regularization effectively forces the noise model to overestimate label flip probabilities, resulting in a "pessimistic" model that denoises labels during training.

Empirical studies demonstrate that the dropout-regularized model outperforms existing methods for handling noisy labels in nearly all test cases, displaying superior classification accuracy even compared to genie-aided models with known noise statistics. For instance, with CIFAR-10, dropout achieved classification error rates lower than both baseline and trace-regularized methods, particularly under uniform noise conditions and at higher noise levels.

Implications

The proposed approach highlights the utility of dropout in learning robust noise models. By introducing multiplicative noise during training, dropout regularization prevents overfitting to noisy labels, thus improving the clustering accuracy of images based on inherent features rather than corrupted labels. This aligns with findings that suggest deep networks perform better when the learning process implicitly allows for the natural clustering of data.

Future Directions

The work opens several avenues for further research, such as understanding the dynamics of pessimistic noise models in various types of deep architectures and exploring the application scope across different datasets and noise models. Investigating whether dropout can be optimally tuned or combined with other regularization techniques for different levels and types of label noise would be valuable. Furthermore, probing into the one-to-one relationship between softmax and linear noise models could offer deeper insights into optimizing label noise handling mechanisms.

In summary, this paper presents a robust methodology for improving the accuracy of deep networks trained on noisy datasets, offering significant practical implications for enhancing the reliability of large-scale data-driven AI systems. With further development, these findings have the potential to refine data processing techniques across various domains where label noise is prevalent.

PDF Markdown