A Baseline for Few-Shot Image Classification (1909.02729v5)

Published 6 Sep 2019 in cs.LG, cs.CV, and stat.ML

Abstract: Fine-tuning a deep network trained with the standard cross-entropy loss is a strong baseline for few-shot learning. When fine-tuned transductively, this outperforms the current state-of-the-art on standard datasets such as Mini-ImageNet, Tiered-ImageNet, CIFAR-FS and FC-100 with the same hyper-parameters. The simplicity of this approach enables us to demonstrate the first few-shot learning results on the ImageNet-21k dataset. We find that using a large number of meta-training classes results in high few-shot accuracies even for a large number of few-shot classes. We do not advocate our approach as the solution for few-shot learning, but simply use the results to highlight limitations of current benchmarks and few-shot protocols. We perform extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode. This metric can be used to report the performance of few-shot algorithms in a more systematic way.

Citations (550)

View on Semantic Scholar

Summary

The paper introduces transductive fine-tuning as a robust baseline, leveraging unlabeled test data to optimize both feature extraction and classification.
It utilizes support-based initialization to maximize cosine similarity, outperforming state-of-the-art methods on datasets like Mini-ImageNet and Tiered-ImageNet.
The method demonstrates scalability on large datasets such as ImageNet-21k, challenging the need for complex meta-learning algorithms.

Few-Shot Image Classification Through Transductive Fine-Tuning

The paper "A Baseline for Few-Shot Image Classification" presents a systematic approach to few-shot learning by advocating for transductive fine-tuning as a robust and straightforward baseline. This method challenges the intricate models dominating the few-shot learning space, demonstrating that simplicity paired with careful design choices yields competitive or superior results.

Overview

The authors propose that fine-tuning a deep neural network, initially trained using cross-entropy loss, provides strong performance in few-shot learning scenarios. Specifically, this performance is enhanced when employing transductive fine-tuning—where the information from the test samples is utilized during inference. The approach outperforms state-of-the-art methods on standard datasets such as Mini-ImageNet and Tiered-ImageNet.

Key Contributions

Transductive Fine-Tuning: The paper introduces a baseline that leverages unlabeled test data during fine-tuning, optimizing a model trained on a separate meta-training dataset. This involves adapting both the classifier and the feature extractor using information from the specific task at hand.
Support-Based Initialization: Drawing from deep metric learning, the paper suggests initializing the classifier weights using support samples, thus maximizing cosine similarity between class weights and sample features.
Benchmark Results: Conducting experiments across popular few-shot datasets, the authors demonstrate that their method surpasses existing benchmarks without needing specialized training per dataset or few-shot protocol.
Scalability: The method has been tested on large-scale datasets like ImageNet-21k, illustrating its feasibility and robustness in few-shot scenarios involving significant data.

Numerical Results and Claims

The proposed approach achieves notable accuracies, such as 68.11% on the 1-shot, 5-way Mini-ImageNet task, significantly higher than existing methods. Achieving 58.04% in the 5-shot, 20-way scenario on ImageNet-21k, it showcases exceptional applicability to large-scale tasks.

Theoretical and Practical Implications

Theoretically, this paper challenges the perceived necessity of complex meta-learning algorithms, suggesting that improvements may instead derive from leveraging traditional supervised learning techniques alongside transductive methods. Practically, it opens new avenues for few-shot systems by emphasizing simplicity, robustness, and efficiency, particularly in dealing with vast and heterogeneous data sets.

Future Directions

The implications of transductive fine-tuning suggest potential exploration into hybrid models combining transduction with other semi-supervised learning techniques. Additionally, tweaking hyperparameters per dataset while maintaining a baseline configuration across multiple datasets could further enhance performance without over-specialization.

Conclusion

The paper advocates a reevaluation of the landscape of few-shot learning, asserting that simplicity empowered by transductive learning offers a reliable and scalable baseline. This approach not only underscores the potential advantages of straightforward techniques but also facilitates a better understanding of the inherent challenges and the true efficacy of emerging few-shot learning algorithms.

PDF Markdown

Related Papers

YouTube

Show All Videos