Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 135 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Training-Free Action Recognition and Goal Inference with Dynamic Frame Selection (2401.12471v2)

Published 23 Jan 2024 in cs.CV

Abstract: We introduce VidTFS, a Training-free, open-vocabulary video goal and action inference framework that combines the frozen vision foundational model (VFM) and LLM with a novel dynamic Frame Selection module. Our experiments demonstrate that the proposed frame selection module improves the performance of the framework significantly. We validate the performance of the proposed VidTFS on four widely used video datasets, including CrossTask, COIN, UCF101, and ActivityNet, covering goal inference and action recognition tasks under open-vocabulary settings without requiring any training or fine-tuning. The results show that VidTFS outperforms pretrained and instruction-tuned multimodal LLMs that directly stack LLM and VFM for downstream video inference tasks. Our VidTFS with its adaptability shows the future potential for generalizing to new training-free video inference tasks.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.