Quo Vadis, Skeleton Action Recognition ? (2007.02072v2)

Published 4 Jul 2020 in cs.CV, cs.GR, and cs.MM

Abstract: In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To study skeleton-action recognition in the wild, we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. We also introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. We benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. The results from benchmarking the top performers of NTU-120 on the newly introduced datasets reveal the challenges and domain gap induced by actions in the wild. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. Via the introduced datasets, our work enables new frontiers for human action recognition.

Citations (60)

View on Semantic Scholar

Summary

The paper presents three datasets—Skeletics-152, Skeleton-Mimetics, and Metaphorics—that benchmark diverse scenarios in skeleton-based action recognition.
The authors employ strategic data curation and transfer learning techniques, achieving competitive performance on authentic 3-D skeleton representations.
The study reveals significant gaps in current models, especially in handling abstract and metaphorical actions, prompting the need for enhanced semantic mapping.

A Scrutiny of Current Trends and Emerging Frontiers in Skeleton-Based Action Recognition

Skeleton-based action recognition has been progressively advancing with the assistance of sophisticated datasets and methodologies. The paper "Quo Vadis, Skeleton Action Recognition?" presents a comprehensive exploration of the state-of-the-art in this domain, while introducing novel datasets and elaborating on their potential impact within the research community.

Datasets and Methodology

The authors introduce three notable datasets: Skeletics-152, Skeleton-Mimetics, and Metaphorics. Each of these datasets provides distinct perspectives and challenges pertinent to the recognition of skeleton-based actions.

Skeletics-152 was meticulously curated from Kinetics-700 and encompasses a variety of actions depicted in 3-D skeleton format. This dataset aims to facilitate recognition in the wild where contextual challenges abound, such as variability in action performance and diverse interaction scenarios. One of the key tasks was filtering suitable classes from Kinetics-700 to avoid occlusion, egocentric views, and contextual biases. Notably, Skeletics-152 contrasts with prior datasets like Skeleton-Kinetics-400 by representing true 3-D motions rather than pseudo 3-D representations, thus better capturing action dynamics.
Skeleton-Mimetics focuses on exaggerated, out-of-context actions derived from the Mimetics dataset, thus pushing the boundaries of skeleton action recognition beyond inherently contextual scenarios. The challenges intrinsic to this dataset stem from its thematic focus on mimicry experts performing actions devoid of typical object interactions, prompting evaluation under constrained, yet exaggerated conditions.
Metaphorics is particularly intriguing for its exploration of abstract actions through dumb charades and interpretative dance videos. The dataset poses unique challenges due to its open-ended vocabulary and the metaphorical nature of actions, diverging from traditional fixed-category datasets. The resulting skeleton sequences from Metaphorics require sophisticated abstraction abilities from the recognition models to map actions to open-ended lexical phrases, a notable departure from standard classification paradigms.

Key Results and Observations

The paper benchmarks existing models on these datasets and provides comprehensive performance analysis:

In terms of performance on Skeletics-152, the MS-G3D model and 4s-ShiftGCN exhibit competitive recognition capabilities, albeit with distinct challenges arising from intra-class variability and noise induced by real-world conditions. Model pre-training on NTU-120 with subsequent fine-tuning on Skeletics-152 showed tangible improvements, highlighting the model transferability across controlled and wild settings.
For Skeleton-Mimetics, models trained on Skeletics-152 demonstrated robustness, albeit being challenged by non-contextual and exaggerated actions. This showcases the transfer learning potential from the curated Skeletics-152 to domain-specific datasets and the inherent need for models to adapt to varied action semantics.
Metaphorics, owing to its metaphorical nature and action sequence brevity, presented significant challenges. The models trained on NTU-120 and Skeletics-152 depicted limited alignment with the metaphorical ground truth phrases, indicating substantial gaps in current skeleton recognition abilities. This dataset underscores the limitation of existing classification models when abstract actions demand nuanced understanding beyond precise, verb-based categories.

Implications and Future Directions

The datasets and findings from this paper emphasize key research directions, notably the development of more generalizable and robust skeleton recognition models that can effectively handle action variability and contextual cues. The introduction of metaphorical and abstract datasets like Metaphorics necessitates paradigm shifts towards approaches capable of comprehensive semantic mapping of actions to broader narratives, potentially leveraging advancements in natural language processing for improved embeddings and contextualization.

The disclosed datasets hold the potential to initiate new exploration avenues, such as interactive studies on human-machine collaboration in action understanding or real-time action recognition systems with broad vocabularies. Engaging the community through enhanced data availability and benchmark establishment can propel further innovations and discoveries within the domain of skeleton-based action recognition.

PDF Markdown

Related Papers

YouTube

Show All Videos