Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey (1912.05170v3)

Published 11 Dec 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Image classification systems recently made a giant leap with the advancement of deep neural networks. However, these systems require an excessive amount of labeled data to be adequately trained. Gathering a correctly annotated dataset is not always feasible due to several factors, such as the expensiveness of the labeling process or difficulty of correctly classifying data, even for the experts. Because of these practical challenges, label noise is a common problem in real-world datasets, and numerous methods to train deep neural networks with label noise are proposed in the literature. Although deep neural networks are known to be relatively robust to label noise, their tendency to overfit data makes them vulnerable to memorizing even random noise. Therefore, it is crucial to consider the existence of label noise and develop counter algorithms to fade away its adverse effects to train deep neural networks efficiently. Even though an extensive survey of machine learning techniques under label noise exists, the literature lacks a comprehensive survey of methodologies centered explicitly around deep learning in the presence of noisy labels. This paper aims to present these algorithms while categorizing them into one of the two subgroups: noise model based and noise model free methods. Algorithms in the first group aim to estimate the noise structure and use this information to avoid the adverse effects of noisy labels. Differently, methods in the second group try to come up with inherently noise robust algorithms by using approaches like robust losses, regularizers or other learning paradigms.

Citations (298)

View on Semantic Scholar

Summary

The paper systematically surveys challenges and methods for mitigating label noise in deep image classification.
It categorizes techniques into noise model based methods like label cleaning and pruning, and noise model free approaches such as robust loss functions and meta learning.
The study highlights future research directions including scalable frameworks, analyzing learning dynamics under noise, and establishing standardized datasets for evaluation.

Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey

The paper "Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey" systematically explores the challenges and methodologies associated with addressing label noise in deep learning systems, specifically within image classification frameworks. While advancements in deep neural networks have significantly improved image classification, the necessity for large, accurately annotated datasets poses practical challenges, leading to prevalent label noise in many real-world datasets.

Problem Context and Challenges

The need for abundant labeled data in supervised learning frequently encounters the hindrances of costly and complex annotation processes. This often leads to label noise, impacting the training process of deep neural networks by introducing errors that the network might overfit. Distinguishing between feature noise and label noise, the latter is uniquely problematic due to its direct impact on the classification outcomes and limited capacity for mitigation compared to multiple feature data that might even out errors.

Key Methodologies

The survey categorizes noise management strategies into two primary groups: noise model based methods and noise model free methods.

Noise Model Based Methods

These methods attempt to directly estimate or counteract the noise distribution within the dataset:

Noisy Channel Methods rely on estimating a noise transition matrix to modify predictions, aiming to match noisy label distributions. Variants include explicit and iterative calculations of noise structure.
Label Cleaning involves correcting mislabeled data to improve training efficacy. Techniques vary depending on available clean data and involve either complete or partial corrections.
Dataset Pruning removes or de-emphasizes noisy data instances, sometimes leveraging clean-label subsets to guide pruning.
Sample Choosing dynamically selects training samples to prioritize noise-free or informative instances, employing strategies like curriculum learning where easier samples are learned first.
Sample Importance Weighting assigns different weights to samples based on their estimated noise levels, emphasizing less noisy samples during training.
Labeler Quality Assessment focuses on understanding the variations in label quality from different annotators, often using techniques like EM algorithms for estimation.

Noise Model Free Methods

These techniques aim for inherently robust strategies without directly modeling noise:

Robust Loss Functions explore alternative loss formulations to mitigate overfitting to label noise.
Meta Learning adapts learning algorithms to incorporate noise tolerance, often utilizing clean subsets to refine noise resistance.
Regularizers such as dropout and adversarial training are leveraged to prevent overfitting to noise.
Ensemble Methods employ multiple models to mitigate the impact of noise through averaged predictions.
Other Techniques include methods such as complementary labels and prototype learning that enhance robustness without explicit noise modeling.

Implications and Prospective Directions

The paper emphasizes the need for robust methodologies capable of handling not only synthetic noise in controlled settings but also the complex, real-world noise encountered in label acquisition processes like crowdsourcing and web data aggregation. Exploring the interplay between deep learning architectures and label noise can unveil further insights into optimizing learning from noisy datasets.

Future research opportunities highlighted include the development of scalable frameworks for small datasets, investigations into learning dynamics under various noise types, and leveraging noisy datasets as additional supervision in semi-supervised learning configurations. Establishing standardized noisy datasets for evaluation can aid in the objective comparison of methodologies, thereby facilitating advancements in this domain.

In summary, the paper provides a comprehensive overview of existing strategies and methodologies for handling label noise in deep learning, effectively setting a foundation for continued research and development in creating more resilient image classification systems under noisy conditions.

PDF Markdown