An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022 (2211.08776v1)

Published 16 Nov 2022 in cs.CV and cs.IR

Abstract: This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic variance through contrastive learning and speeds up inference by pre-filtering the candidate windows relevant to the NL query, and (2) conducts intra-window (fine-grained) candidate moments ranking utilizing the powerful multi-modal alignment ability of the contrastive vision-text pre-trained model EgoVLP. On the blind test set, CONE achieves 15.26 and 9.24 for R1@IoU=0.3 and R1@IoU=0.5, respectively.

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022 (2211.08776v1)

Summary

Related Papers