Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation (2208.00219v1)

Published 30 Jul 2022 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, which (i) is the first image-level few-shot detector, and (ii) introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes for robust and accurate few-shot object detection. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. In addition, the introduced correlational meta-learning enables Meta-DETR to simultaneously attend to multiple support classes within a single feedforward, which allows to capture the inter-class correlation among different classes, thus significantly reducing the misclassification over similar classes and enhancing knowledge generalization to novel classes. Experiments over multiple few-shot object detection benchmarks show that the proposed Meta-DETR outperforms state-of-the-art methods by large margins. The implementation codes are available at https://github.com/ZhangGongjie/Meta-DETR.

Citations (73)

Summary

  • The paper introduces Meta-DETR, which eliminates region proposals by employing a DETR-based image-level few-shot detection framework.
  • It utilizes an inter-class correlational meta-learning strategy that processes multiple support classes simultaneously to reduce misclassification.
  • Meta-DETR achieves superior mAP on benchmark datasets, demonstrating robust performance even with minimal labeled data.

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

The paper presents Meta-DETR, a novel approach in the field of few-shot object detection, which distinguishes itself from existing methodologies primarily by operating at the image level without relying on region proposals. Traditional approaches to few-shot detection leverage region-based frameworks like Faster R-CNN, which suffer from deficiencies in region proposal quality for novel classes. Meta-DETR addresses this limitation by utilizing a DETR-based framework, facilitating pure image-level prediction. This key deviation enables Meta-DETR to sidestep inaccuracies inherent in region-based predictions, thereby offering more robust detection capabilities for novel objects.

A critical aspect of Meta-DETR is the incorporation of an inter-class correlational meta-learning strategy. This element allows the model to effectively discern and leverage correlations among different classes during training. Unlike previous approaches that treat each support class independently, Meta-DETR processes multiple support classes simultaneously. This strategy not only enhances generalization capabilities by recognizing cross-class relationships but also significantly reduces misclassification among similar classes.

The paper reports that Meta-DETR achieves superior performance on several few-shot object detection benchmarks, including Pascal VOC and MS COCO, outperforming state-of-the-art methods by substantial margins. Numerical results highlight significant improvements in detection mAP across varying shot settings, underscoring Meta-DETR's efficacy in learning from minimal labeled data.

Practically, the implications of Meta-DETR are considerable. By eliminating dependency on region proposals, the model provides a more robust generalization framework even with extremely limited samples. Moreover, the ability to exploit inter-class correlations could be beneficial in real-world applications where novel object categories frequently appear, and annotations are scarce.

Theoretically, the success of Meta-DETR emphasizes the potential of image-level frameworks in few-shot detection and the utility of correlational learning strategies. As the research community continues to explore few-shot and zero-shot learning paradigms, Meta-DETR sets a precedent for future models to leverage holistic image features and class relationships to improve learning efficiency and accuracy.

Future research may delve into integrating multi-scale features into the Meta-DETR framework, potentially improving detection capabilities for small or occluded objects. Furthermore, extending the correlational meta-learning strategy to other vision tasks, such as segmentation or tracking, provides a promising avenue for expanding the framework's applicability.

In conclusion, Meta-DETR represents a significant advancement in few-shot object detection by foregoing traditional region-based methodologies and embracing an image-level approach accompanied by inter-class correlation exploitation. This innovative framework not only enhances the adaptability and accuracy of object detectors but also provides a strong foundation for further research and development in the discipline.