- The paper proposes an iterative framework that refines noisy label detection and improves CNN training in the presence of open-set and closed-set noise.
- It employs a probabilistically extended Local Outlier Factor with a Siamese network to distinguish clean samples from noisy ones effectively.
- Experiments on datasets like CIFAR-10 and ImageNet show superior robustness and accuracy, validating the approach on both controlled and real-world noisy data.
Iterative Learning with Open-set Noisy Labels
The paper "Iterative Learning with Open-set Noisy Labels" addresses a critical challenge in the training of Convolutional Neural Networks (CNNs)—the pervasive issue of noisy label data, specifically within the open-set noisy label setting. In conventional closed-set scenarios, label noise is assumed to be confined within the known set of classes. However, the open-set situation introduces a layer of complexity where the true class of a mislabeled sample is absent from the set of known classes. This situation is highly relevant in real-world applications, such as those leveraging web-sourced datasets, which inherently contain a mix of in-distribution (closed-set) and out-of-distribution (open-set) noise.
To tackle this, the authors propose a novel iterative framework designed to train CNNs effectively even when faced with a significant amount of noisy labels—both open- and closed-set. The framework integrates three core components: iterative noisy label detection using a probabilistic extension of the Local Outlier Factor (LOF) methodology, discriminative feature learning via a Siamese network structure, and a reweighting mechanism tailored for robust softmax loss accommodation.
The iterative detection of noisy labels relies on the probabilistically cumulative Local Outlier Factor (pcLOF). This approach detects samples based on their representational inconsistency within their assigned class and iteratively refines this detection, leveraging more discriminative features produced in each learning iteration. Utilizing a Siamese network with contrastive loss, the framework optimizes the feature learning by ensuring that representations of clean samples remain distinct from those of noisy samples. The reweighting strategy further fine-tunes the model training by assigning appropriate weights to samples based on their noise likelihood, allowing the model to focus on clean data without entirely discarding potentially valuable information from noisy samples.
The evaluations on CIFAR-10, ImageNet, and a web-search dataset demonstrate the robustness of the proposed framework. On CIFAR-10 with 40% open-set noise, the model outperforms state-of-the-art methods, achieving superior classification accuracy through its strategic handling of noise. Similarly, on ImageNet, the framework maintains competitive results across network architectures such as ResNet-50 and Inception-v3, indicating its applicability to large-scale datasets. Real-world web data evaluation further underscores its practical value, highlighting its capability to leverage webly supervised data for enhanced CNN training.
This research brings forth significant contributions to the discourse on noisy label learning by expanding the paradigm to encompass open-set conditions, underscoring the importance of iterative, discriminative, and reweighting strategies in dealing with heterogeneous noise. The results not only demonstrate the efficacy of the approach proposed but also set a foundation for future explorations in robust representation learning from ubiquitously noisy datasets. Future work could extend this methodology by incorporating advanced feature extraction techniques and exploring additional adaptive noise handling mechanisms, ultimately pushing further the boundaries of learning from imperfect data environments.