Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 155 tok/s Pro
GPT OSS 120B 476 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection (1905.04430v2)

Published 11 May 2019 in cs.CV and cs.LG

Abstract: Retailers have long been searching for ways to effectively understand their customers' behaviour in order to provide a smooth and pleasant shopping experience that attracts more customers everyday and maximises their revenue, consequently. Humans can flawlessly understand others' behaviour by combining different visual cues from activity to gestures and facial expressions. Empowering the computer vision systems to do so, however, is still an open problem due to its intrinsic challenges as well as extrinsic enforced difficulties like lack of publicly available data and unique environment conditions (wild). In this work, We emphasise on detecting the first and by far the most crucial cue in behaviour analysis; that is human activity detection in computer vision. To do so, we introduce a framework for integrating human pose and object motion to both temporally detect and classify the activities in a fine-grained manner (very short and similar activities). We incorporate partial human pose and interaction with the objects in a multi-stream neural network architecture to guide the spatiotemporal attention mechanism for more efficient activity recognition. To this end, in the absence of pose supervision, we propose to use the Generative Adversarial Network (GAN) to generate exact joint locations from noisy probability heat maps. Additionally, based on the intuition that complex actions demand more than one source of information to be identified even by humans, we integrate the second stream of object motion to our network as a prior knowledge that we quantitatively show improves the recognition results. We empirically show the capability of our approach by achieving state-of-the-art results on MERL shopping dataset. We further investigate the effectiveness of this approach on a new shopping dataset that we have collected to address existing shortcomings.

Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube