Joint Optimization Framework for Learning with Noisy Labels (1803.11364v1)

Published 30 Mar 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels. Training on such noisy labeled datasets causes performance degradation because DNNs easily overfit to noisy labels. To overcome this problem, we propose a joint optimization framework of learning DNN parameters and estimating true labels. Our framework can correct labels during training by alternating update of network parameters and labels. We conduct experiments on the noisy CIFAR-10 datasets and the Clothing1M dataset. The results indicate that our approach significantly outperforms other state-of-the-art methods.

Citations (670)

View on Semantic Scholar

Summary

The paper introduces a joint optimization strategy that alternates between updating network parameters and refining label estimates to mitigate overfitting on noisy data.
It employs a loss function combining classification, prior probability regularization, and entropy terms to robustly handle label noise.
Experimental results on CIFAR-10 and Clothing1M demonstrate improved accuracy and reduced reliance on noise transition matrices compared to state-of-the-art methods.

Joint Optimization Framework for Learning with Noisy Labels

Overview

The paper by Tanaka et al. introduces a novel joint optimization framework designed to address the challenge of training deep neural networks (DNNs) on datasets with noisy labels. This framework simultaneously optimizes both the DNN parameters and the estimated true labels, aiming to mitigate the overfitting tendencies of DNNs on incorrect labels. The authors conduct experiments using noisy CIFAR-10 and Clothing1M datasets, demonstrating enhanced performance over existing state-of-the-art methods.

Motivation and Problem Statement

Deep learning models, particularly DNNs, typically require large-scale datasets with accurate annotations for effective training. However, such datasets often come with noisy labels when collected from automated or semi-automated web sources, posing a risk of performance degradation due to the models' propensity to overfit noisy data. Common approaches like regularization and early stopping have limitations, thus motivating the need for a new framework to handle noisy labeled data more effectively.

Proposed Framework

The primary contribution is a joint optimization strategy that alternates between updating the network parameters and refining label estimates. Contrasting with traditional methods that treat noisy labels as static, this approach adapts them throughout the training process. The framework's core is defined by an optimization problem where the loss function comprises three main components: classification loss, a prior probability regularization term, and an entropy term.

Classification Loss: Implemented using Kullback-Leibler divergence to maintain consistency between predicted and estimated labels.
Prior Probability Regularization: Ensures diversity in label distribution, preventing the model from collapsing to a trivial solution.
Entropy Regularization: Concentrates probability distributions, ensuring that label predictions are decisive and minimizing ambiguity.

Methodology

The optimization is performed through an alternating strategy:

Network Parameter Update: Uses stochastic gradient descent on the defined loss function.
Label Update: Two methods are explored—hard-label and soft-label—where soft-label updating exhibits superior performance by incorporating prediction confidence directly.

Key experimental results indicate the framework's ability to prevent memorization of incorrect labels, particularly under high learning rates, reinforcing the findings of Arpit et al. This strategic use of learning rates enables the model to differentiate between noisy and clean labels effectively.

Experimental Results

The framework's efficacy is validated on both synthetic noisy datasets (CIFAR-10 with symmetric and asymmetric noise) and real-world noisy datasets (Clothing1M):

CIFAR-10: The proposed method consistently outperforms existing techniques, achieving higher test and recovery accuracies across various noise levels.
Clothing1M: In practical scenarios, the framework exceeds the performance of previous methods, notably without necessitating ground-truth noise transition matrices.

Implications and Future Directions

The joint optimization framework presents a significant advancement in handling noisy labeled data, both theoretically and practically. It potentially opens avenues for robust training algorithms that minimize human intervention in curating large datasets. Future developments might explore:

Extending the framework to other data modalities such as text and audio.
Investigating adaptive learning rate schedules to further enhance recovery accuracy.
Integrating unsupervised or semi-supervised learning methods to generalize across varied noise profiles.

The framework's design aligns with ongoing research tendencies in AI, emphasizing minimal supervision and maximal model efficiency on real-world noisy datasets.

PDF Markdown