Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results (1703.01780v6)

Published 6 Mar 2017 in cs.NE, cs.LG, and stat.ML

Abstract: The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.

Citations (1,308)

View on Semantic Scholar

Summary

The paper presents the Mean Teacher method which improves consistency targets by averaging model weights over training steps.
It demonstrates enhanced scalability and accuracy, achieving 4.35% error on SVHN with just 250 labels compared to higher error rates from previous methods.
The approach enables near-supervised performance in SSL, reducing the reliance on large labeled datasets and improving generalization.

An Analysis of Mean Teachers in Semi-Supervised Deep Learning

The paper "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results" by Antti Tarvainen and Harri Valpola addresses an important challenge in semi-supervised learning: leveraging unlabeled data effectively to improve model generalization. Their proposition, the Mean Teacher method, presents a significant advancement over the Temporal Ensembling technique by making the process more scalable and theoretically sound.

Introduction and Background

Semi-supervised learning (SSL) occupies a crucial space in deep learning, especially when obtaining large, high-quality labeled datasets is impractical or costly. Traditional deep learning models risk overfitting when trained with limited labeled data due to their large number of parameters. SSL aims to mitigate this by incorporating unlabeled data during the training process. This paper builds upon the Temporal Ensembling technique, which showed state-of-the-art performance by maintaining an exponential moving average (EMA) of label predictions and enforcing consistency between model predictions and these targets.

Mean Teacher Method

The core innovation in this work is the Mean Teacher method. Unlike Temporal Ensembling, which averages label predictions, Mean Teacher averages model weights over successive training steps. This method benefits from three key aspects:

Weight Averaging: The teacher model in Mean Teacher uses an EMA of the student model weights, thereby providing more refined and stable target predictions.
Scalability: By updating targets at each iteration rather than once per epoch, Mean Teacher remains feasible even for large datasets.
Higher Accuracy: The results presented in the paper demonstrate that Mean Teacher can improve test accuracy significantly and efficiently leverage a smaller number of labeled examples.

Experimental Results

The experiments conducted on SVHN and CIFAR-10 datasets illustrate the efficacy of Mean Teacher. Notably, on SVHN with only 250 labels, Mean Teacher achieved an error rate of 4.35%, outperforming Temporal Ensembling which required 1000 labels to reach 5.12%. Similarly, combining Mean Teacher with Residual Networks brought the error rate on CIFAR-10 with 4000 labels down to 6.28%, much improved over the 10.55% achieved by related methods.

SVHN with Extra Unlabeled Data

An additional set of experiments involving the use of extra unlabeled data in SVHN further validated Mean Teacher's efficiency. The error rate continued to drop with more unlabeled data, indicating the method’s robustness and ability to harness unlabeled data effectively.

Theoretical Implications

The Mean Teacher method contributes to the ongoing understanding of SSL by emphasizing the importance of stable and accurate target predictions via weight averaging. It aligns with earlier SSL theories that propose exploiting the manifold structure of data for improved generalization but does so more efficiently.

Practical Implications and Future Work

Practically, the findings suggest that semi-supervised learning may achieve near-supervised learning performance even with significantly fewer labels, which is highly relevant for domains where labeling is expensive or time-consuming. Future work could explore combining Mean Teacher with techniques such as Virtual Adversarial Training, which the authors hypothesize could yield even better results. Additionally, application of Mean Teacher in other domains and datasets would validate its generalizability and robustness.

Conclusion

In summary, the Mean Teacher method presents a compelling advancement in the field of semi-supervised learning. By averaging model weights instead of predictions, it provides a faster, more scalable, and more accurate framework for leveraging unlabeled data. This paper offers a notable contribution to SSL research, promising both theoretical insights and practical benefits for deep learning applications.

PDF Markdown

Related Papers

GitHub

GitHub - CuriousAI/mean-teacher: A state-of-the-art semi-supervised method for image recognition (1,629 stars)

Tweets

https://twitter.com/ShahriarFaghani/status/1765549397029347802