Noise-Tolerant Paradigm for Training Face Recognition CNNs (1903.10357v2)

Published 25 Mar 2019 in cs.CV

Abstract: Benefit from large-scale training datasets, deep Convolutional Neural Networks(CNNs) have achieved impressive results in face recognition(FR). However, tremendous scale of datasets inevitably lead to noisy data, which obviously reduce the performance of the trained CNN models. Kicking out wrong labels from large-scale FR datasets is still very expensive, although some cleaning approaches are proposed. According to the analysis of the whole process of training CNN models supervised by angular margin based loss(AM-Loss) functions, we find that the $\theta$ distribution of training samples implicitly reflects their probability of being clean. Thus, we propose a novel training paradigm that employs the idea of weighting samples based on the above probability. Without any prior knowledge of noise, we can train high performance CNN models with large-scale FR datasets. Experiments demonstrate the effectiveness of our training paradigm. The codes are available at https://github.com/huangyangyu/NoiseFace.

Citations (60)

View on Semantic Scholar

Summary

The paper presents a dynamic weighting mechanism based on angular metrics that effectively distinguishes clean from noisy samples during CNN training.
It employs a three-phase strategy by treating all samples equally first, then emphasizing lower-angle (cleaner) samples followed by semi-hard samples to refine the model.
Empirical results on datasets like MS-Celeb-1M and CASIA-WebFace show significant performance gains in face recognition even with noise rates exceeding 50%.

A Noise-Tolerant Paradigm for Training Robust Face Recognition CNNs

The paper "Noise-Tolerant Paradigm for Training Face Recognition CNNs" addresses a critical challenge in the deployment of deep learning models for face recognition (FR)—the presence of noisy data in large-scale training datasets. The authors propose a novel training paradigm that significantly mitigates the impact of noisy data on Convolutional Neural Networks (CNNs) used for face recognition by leveraging the $\theta$ distribution of training samples to dynamically adjust their weights throughout the training process.

Problem Context

Face recognition models have benefited substantially from deep CNNs trained on large datasets, such as MS-Celeb-1M, which contains vast amounts of facial images across numerous identities. However, the sheer scale of these datasets inherently brings a high rate of noise, mainly in the form of mislabeled data, with some datasets exhibiting a noise rate exceeding 50%. Conventional data cleaning approaches prove inadequate, as they are expensive and not thoroughly effective, thus necessitating a paradigm that can directly handle noisy data.

Proposed Methodology

The proposed paradigm innovatively utilizes angular margin based loss (AM-Loss) functions, such as L2-Softmax and ArcFace, observing that the $\theta$ (angle) between a class vector and a feature vector implicitly indicates the likelihood of a training sample being clean. This insight forms the basis for a dynamic sample weighting mechanism. The key steps and findings of the proposed methodology are:

Theta Distribution Insights: The $\theta$ values for clean samples generally tend to be smaller, implying a closer alignment with the correct class vector compared to noisy samples. This observation forms the basis for assigning weights to training samples based on their $\theta$ value.
Weighting Strategy: The training paradigm adopts three core strategies over the training lifecycle:
1. Initially, all samples are treated equally to allow the model to acquire basic discriminative features.
2. Subsequently, samples with smaller $\theta$ values, suggesting higher accuracy, are assigned larger weights, implicitly emphasizing cleaner data.
3. Finally, semi-hard samples are given greater emphasis to enhance model performance further, leveraging the trained model's capability to handle more challenging data.
Fusion of Strategies: A fusion mechanism dynamically adjusts the relative importance of the weighting strategies based on the progression of the training, particularly using the $\theta$ distribution to guide this process.

Empirical Findings and Implications

The authors validate their approach on several datasets, including a noisy version of CASIA-WebFace, the original and refined versions of MS-Celeb-1M, and IMDB-Face. The results highlight significant performance improvements in face verification tasks over conventional training methods, particularly in scenarios exhibiting high noise rates above 50%. The paradigm also exhibits the ability to estimate the noise level within a dataset accurately, adding a practical tool for dataset refinement.

By allowing FR models to be trained directly on noisy datasets with minimal performance degradation, this noise-tolerant paradigm holds practical significance. It reduces the dependency on cleaned datasets and allows for the use of larger, raw datasets, fostering scalability and innovation in real-world face recognition applications. The theoretical implications also extend to improving the robustness across other domains using deep learning on noisy datasets.

Conclusion and Future Perspective

The paper successfully demonstrates a method to overcome a fundamental barrier in face recognition model training, with broader applications likely beyond this domain. Future advancements may focus on refining the weighting mechanism and extending this approach to other types of machine learning paradigms. Furthermore, exploration into the theoretical understanding of why certain methods of applying weights yield superior results would provide an avenue for deeper academic inquiry.

PDF Markdown

Related Papers

GitHub

GitHub - huangyangyu/NoiseFace: Noise-Tolerant Paradigm for Training Face Recognition CNNs [Official, CVPR 2019] (136 stars)