Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval (2007.02503v1)

Published 6 Jul 2020 in cs.CV

Abstract: The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos. Specifically, given a complex user query, we first recursively compose a latent semantic tree to structurally describe the text query. We then design a tree-augmented query encoder to derive structure-aware query representation and a temporal attentive video encoder to model the temporal characteristics of videos. Finally, both the query and videos are mapped into a joint embedding space for matching and ranking. In this approach, we have a better understanding and modeling of the complex queries, thereby achieving a better video retrieval performance. Extensive experiments on large scale video retrieval benchmark datasets demonstrate the effectiveness of our approach.

Citations (130)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube