CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images (1808.01097v4)

Published 3 Aug 2018 in cs.CV

Abstract: We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly-supervised web images, which are crawled raw from the Internet by using text queries, without any human annotation. We develop a principled learning strategy by leveraging curriculum learning, with the goal of handling a massive amount of noisy labels and data imbalance effectively. We design a new learning curriculum by measuring the complexity of data using its distribution density in a feature space, and rank the complexity in an unsupervised manner. This allows for an efficient implementation of curriculum learning on large-scale web images, resulting in a high-performance CNN model, where the negative impact of noisy labels is reduced substantially. Importantly, we show by experiments that those images with highly noisy labels can surprisingly improve the generalization capability of the model, by serving as a manner of regularization. Our approaches obtain state-of-the-art performance on four benchmarks: WebVision, ImageNet, Clothing-1M and Food-101. With an ensemble of multiple models, we achieved a top-5 error rate of 5.2% on the WebVision challenge for 1000-category classification. This result was the top performance by a wide margin, outperforming second place by a nearly 50% relative error rate. Code and models are available at: https://github.com/MalongTech/CurriculumNet .

Authors (7)

Sheng Guo (49 papers)
Weilin Huang (61 papers)
Haozhi Zhang (3 papers)
Chenfan Zhuang (2 papers)
Dengke Dong (3 papers)
Matthew R. Scott (21 papers)
Dinglong Huang (1 paper)

Citations (317)

View on Semantic Scholar

Summary

The paper introduces a novel curriculum learning strategy to mitigate noisy labels in large-scale web image datasets while improving CNN generalization.
It organizes training data by complexity using an unsupervised ranking in feature space, enabling multi-stage learning from clean to noisy samples.
Empirical results show significant improvements on benchmarks like WebVision and ImageNet, achieving a top-5 error rate of 5.2% on WebVision.

CurriculumNet: A Strategic Approach to Weakly Supervised Learning

The paper "CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images" presents a method of training convolutional neural networks (CNNs) on large datasets collected from the web without manual annotation. The unique challenge with such datasets is the presence of massive noisy labels and data imbalance. By leveraging curriculum learning, the authors introduce a novel approach to mitigate the negative effects of noisy labels while simultaneously enhancing model generalization capabilities.

Core Contributions

Curriculum Design Based on Data Complexity:
- The authors propose a method to design a learning curriculum by measuring data complexity through distribution density in a feature space. This unsupervised strategy enables the ranking of training samples from easy to more complex, effectively organizing data for efficient curriculum-based learning.
Training Strategy:
- The learning strategy is a multi-stage process. It begins with training on a "clean" subset, moves to a "noisy" subset, and concludes with a "highly noisy" subset. This progression mimics curriculum learning, as it involves learning simple structures first before transitioning to more challenging examples.
Empirical Validation on Benchmarks:
- The CurriculumNet approach has been validated against several benchmarks. It shows noticeable improvements over baseline models on the WebVision, ImageNet, Clothing1M, and Food101 datasets. Specifically, it achieves a top-5 error rate of 5.2% on the WebVision challenge, outperforming other methods significantly.
Analysis of Noisy Data:
- An important insight from the paper is that highly noisy data can act as a form of regularization, improving the model's generalization capabilities instead of degrading its performance—contrary to common expectations.

Implications and Future Directions

The implications of this research are noteworthy. By demonstrating an efficient way to utilize large-scale, weakly annotated datasets, CurriculumNet opens pathways for more cost-effective and scalable approaches to training complex models. It underscores the potential of leveraging noisy data intelligently to enhance learning outcomes, offering a robust alternative to more traditional, fully-supervised models.

Looking forward, the robust handling of noisy data facilitated by CurriculumNet can guide the development of more sophisticated training paradigms for diverse domains beyond object classification. Integration with semi-supervised and unsupervised learning methods could further augment its applicability. Additionally, exploring how these strategies fare with other modalities, such as video or multi-modal data, presents an intriguing future research avenue.

Conclusion

In conclusion, CurriculumNet addresses one of the significant challenges in computer vision: the effective use of large-scale, weakly labeled datasets. By incorporating curriculum learning principles, the authors provide a substantial contribution to the field, paving the way for further exploration and application of weakly supervised learning in real-world tasks where data annotation is a barrier. This work stands as a testament to the innovative strategies capable of being leveraged when traditional data paradigms are challenged by practicality and scale.

PDF Markdown