SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition (1907.13025v1)

Published 30 Jul 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of each frame is represented as rows of a matrix. To further improve such representations, we introduce a novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion. The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Different temporal scales are employed to compute motion values to aggregate more temporal dynamics to the representation making it able to capture longrange joint interactions involved in actions as well as filtering noisy motion values. Experimental results demonstrate the effectiveness of the proposed representation on 3D action recognition outperforming the state-of-the-art on NTU RGB+D 120 dataset.

Authors (5)

Carlos Caetano (5 papers)
Jessica Sena (9 papers)
Jefersson A. dos Santos (29 papers)
William Robson Schwartz (28 papers)
François Brémond (10 papers)

Citations (159)

View on Semantic Scholar

Summary

The paper introduces the SkeleMotion representation, which incorporates explicit motion dynamics to enhance 3D action recognition.
It employs a Temporal Scale Aggregation mechanism to capture multi-frame dynamics and reduce noise in skeletal movements.
Experiments on NTU RGB+D 60 and 120 demonstrate significant accuracy improvements, notably achieving 80.1% on NTU RGB+D 60.

SkeleMotion: Motion-Based Skeleton Representation for 3D Action Recognition

The paper "SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition" presents a novel approach to leveraging skeletal data for enhanced 3D action recognition using convolutional neural networks (CNNs). The researchers have focused on the inherent temporal dynamics within skeleton joint sequences, moving beyond the conventional spatial structural representation of joints in action recognition tasks.

Core Contributions

The paper introduces the SkeleMotion representation, which departs from mere spatial encoding by incorporating motion information explicitly through magnitude and orientation calculations of joint movements. This innovative representation captures temporal variations robustly, enriching the data input to CNNs and allowing superior long-range interaction modeling versus pre-existing skeletal image techniques.

A notable element of their approach is the Temporal Scale Aggregation (TSA), enabling the integration of multi-frame computations to mitigate noise and refine the temporal expression of motion. This mechanism broadens the capability of the representation to encapsulate the complexities of movement across varied temporal scales.

Experimental Insights and Performance

The proposed SkeleMotion method was experimentally validated on the NTU RGB+D 60 and NTU RGB+D 120 datasets, which include a broad spectrum of human actions captured via Kinect sensors. When benchmarked against leading approaches, the SkeleMotion representation demonstrated an impressive improvement in accuracy, notably outperforming previous state-of-the-art methods with a substantial margin in cross-view protocols—recording an accuracy of 80.1% on NTU RGB+D 60.

Through selective early and late fusion strategies between SkeleMotion and other spatial structural models like the Tree Structure Skeleton Image (TSSI), further enhancements were visualized, achieving state-of-the-art performance on extensive datasets like NTU RGB+D 120.

Implications and Forward-Looking Perspectives

This research contributes significantly to the domain of 3D action recognition by advancing the capability of computational models to incorporate motion dynamics explicitly, aligning more closely with the reality of human motion. It redefines the processing of skeleton data, advocating for richer, action-qualified inputs that enhance the representational power of CNNs.

The implications of this approach are vast, spanning applications in surveillance, healthcare monitoring, and the synergy between robots and human operators. The efficient modeling of motion allows systems to become adept at recognizing intricate actions, thus improving interaction outcomes and safety measures.

Looking to the future, the approach encourages explorations into diverse architectures and fine-tuning of models to further capitalize on the explicit motion data through deeper or more varied network designs. In addition, applying similar frameworks to 2D action datasets could enhance recognition performance where real-time skeleton data capture remains challenging.

Conclusion

Overall, SkeleMotion marks a pivotal step forward in using skeletal data for action recognition tasks, combining technical depth with practical utility. The comprehensive handling of motion dynamics refines the understanding of human actions and positions this methodology at the forefront of computer vision research, where skeleton data remains a vital component for intelligent systems learning behavioral patterns and interactions.

PDF Markdown

Related Papers

GitHub

GitHub - carloscaetano/skeleton-images: Code for skeleton image representations based on spatial structure of the skeleton joints (AVSS 2019 and SIBGRAPI 2019). (45 stars)