Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 171 tok/s

Gemini 2.5 Pro 41 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 92 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 435 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation (2203.10593v1)

Published 20 Mar 2022 in cs.CV

Abstract: Open-vocabulary object detection aims to detect novel object categories beyond the training set. The advanced open-vocabulary two-stage detectors employ instance-level visual-to-visual knowledge distillation to align the visual space of the detector with the semantic space of the Pre-trained Visual-LLM (PVLM). However, in the more efficient one-stage detector, the absence of class-agnostic object proposals hinders the knowledge distillation on unseen objects, leading to severe performance degradation. In this paper, we propose a hierarchical visual-language knowledge distillation method, i.e., HierKD, for open-vocabulary one-stage detection. Specifically, a global-level knowledge distillation is explored to transfer the knowledge of unseen categories from the PVLM to the detector. Moreover, we combine the proposed global-level knowledge distillation and the common instance-level knowledge distillation to learn the knowledge of seen and unseen categories simultaneously. Extensive experiments on MS-COCO show that our method significantly surpasses the previous best one-stage detector with 11.9\% and 6.7\% $AP_{50}$ gains under the zero-shot detection and generalized zero-shot detection settings, and reduces the $AP_{50}$ performance gap from 14\% to 7.3\% compared to the best two-stage detector.

Citations (70)

View on Semantic Scholar