Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models (2311.16475v2)

Published 26 Nov 2023 in cs.CV

Abstract: Human-Object Interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions. However, conventional HOI detection methods often struggle to fully capture the contextual information needed to accurately identify these interactions. While large Vision-LLMs (VLMs) show promise in tasks involving human interactions, they are not tailored for HOI detection. The complexity of human behavior and the diverse contexts in which these interactions occur make it further challenging. Contextual cues, such as the participants involved, body language, and the surrounding environment, play crucial roles in predicting these interactions, especially those that are unseen or ambiguous. Moreover, large VLMs are trained on vast image and text data, enabling them to generate contextual cues that help in understanding real-world contexts, object relationships, and typical interactions. Building on this, in this paper we introduce ConCue, a novel approach for improving visual feature extraction in HOI detection. Specifically, we first design specialized prompts to utilize large VLMs to generate contextual cues within an image. To fully leverage these cues, we develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors. Extensive experiments and analyses demonstrate the effectiveness of using these contextual cues for HOI detection. The experimental results show that integrating ConCue with existing state-of-the-art methods significantly enhances their performance on two widely used datasets.

Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.