Generalized Category Discovery (2201.02609v2)

Published 7 Jan 2022 in cs.CV and cs.LG

Abstract: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from novel ones. Existing recognition methods are not able to deal with this setting, because they make several restrictive assumptions, such as the unlabelled instances only coming from known - or unknown - classes, and the number of unknown classes being known a-priori. We address the more unconstrained setting, naming it 'Generalized Category Discovery', and challenge all these assumptions. We first establish strong baselines by taking state-of-the-art algorithms from novel category discovery and adapting them for this task. Next, we propose the use of vision transformers with contrastive representation learning for this open-world setting. We then introduce a simple yet effective semi-supervised $k$-means method to cluster the unlabelled data into seen and unseen classes automatically, substantially outperforming the baselines. Finally, we also propose a new approach to estimate the number of classes in the unlabelled data. We thoroughly evaluate our approach on public datasets for generic object classification and on fine-grained datasets, leveraging the recent Semantic Shift Benchmark suite. Project page at https://www.robots.ox.ac.uk/~vgg/research/gcd

Authors (4)

Sagar Vaze (14 papers)
Kai Han (184 papers)
Andrea Vedaldi (195 papers)
Andrew Zisserman (248 papers)

Citations (165)

View on Semantic Scholar

Summary

Generalized Category Discovery: An Expert Overview

"Generalized Category Discovery" by Vaze et al. presents a comprehensive examination of a novel framework in image recognition where the objective is to categorize unlabelled images in a dataset that may originate from either known or novel classes. Traditional image recognition methods are constrained by various assumptions, such as all unlabelled instances belonging solely to either known or entirely new classes, or the number of unknown categories being predetermined. This paper introduces and addresses a more open-ended scenario, named "Generalized Category Discovery" (GCD), challenging these conventional assumptions. The work encapsulates both theoretical formulation and empirical validation of approaches suited for this versatile setting.

The proposed GCD task is formalized as follows: a dataset comprises a labelled partition, $\mathcal{D_L}$ , and an unlabelled partition, $\mathcal{D_U}$ . The goal is to assign labels to all unlabelled images, which may belong to classes seen in the labelled set or completely novel classes. This scenario necessitates a solution robust to the discovery and recognition of categories without any prior hint about the number of potential new classes.

Methodology

Contrastive Learning with Vision Transformers: The authors propose leveraging the strong nearest-neighbor classification capability of vision transformers paired with contrastive learning for this task. This non-parametric approach sidesteps the issue of overfitting traditional classifiers can encounter, particularly when training is biased towards the limited known classes.
Semi-Supervised $k$ -Means: A novel semi-supervised $k$ -means algorithm is introduced to facilitate clustering in an unlabelled dataset, with labelled samples boosting cluster identification towards known categories. Initial centroids for known classes are computed through the ground-truth labels, while additional centroids are derived from unlabelled samples.
Estimating Class Number: To pragmatically estimate the number of classes among unlabelled data, the approach finds the number of clusters that optimizes the clustering accuracy on the labelled data subset via a black-box optimization algorithm.

Results and Implications

The empirical evaluation spans several datasets, both general (CIFAR10, CIFAR100, ImageNet-100) and fine-grained (CUB, Stanford Cars, Herbarium19), revealing that the combination of contrastive pre-training with semi-supervised clustering significantly outperforms existing baselines adapted from novel category discovery tasks. In particular, contrastively trained ViT models perform remarkably well in the feature space clustering task, negating the need for parametric classifier heads that risk overfitting.

This approach underscores the potential for generalized category discovery to address realistic, open-world image recognition scenarios, such as identifying unknown pathologies in medical imaging or novel objects in autonomous driving. It represents a move away from stringent assumptions about category boundaries and emphasizes the importance of broad adaptability in recognition models.

Future Developments

The authors’ methods demonstrate the versatility of contrastive learning in conjunction with vision transformers. However, the exploration of scaling these techniques to more extensive datasets, or integrating them with domain adaptation methods to address potential domain shifts between the labelled and unlabelled datasets, could be valuable avenues for future work. Additionally, applying these approaches in continual learning settings, whereby the model incrementally learns from new data without re-training from scratch, poses an engaging challenge for extending the utility of GCD frameworks.

The paper not only extends the landscape of image classification tasks but also positions itself as a precursor for further exploration into truly autonomous and evolving machine learning systems in dynamic environments.

PDF Markdown