Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
104 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Review of Recent Kinect-based Action Recognition Algorithms (1906.09955v1)

Published 24 Jun 2019 in cs.CV

Abstract: Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.

Citations (203)

Summary

  • The paper provides a detailed comparative evaluation of Kinect-based action recognition algorithms, analyzing both handcrafted and deep learning models on varied datasets.
  • It demonstrates that handcrafted features perform robustly on small datasets by avoiding overfitting, while deep learning methods excel with large-scale data like NTU RGB+D.
  • The study reveals that integrating depth and skeleton features enhances cross-view recognition, although depth data remains sensitive to noise in challenging scenarios.

A Comparative Review of Recent Kinect-based Action Recognition Algorithms

The paper "A Comparative Review of Recent Kinect-based Action Recognition Algorithms" offers an in-depth comparative analysis of various state-of-the-art algorithms for human action recognition, utilizing data from Kinect sensors. This paper distinguishes itself by specifically focusing on the comparative performance of distinct feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features.

The authors conducted experiments using six benchmark datasets: MSRAction3D, 3D Action Pairs, CAD-60, UWA3D Activity Dataset, UWA3D Multiview Activity II, and the extensive NTU RGB+D dataset. The comparison involved ten algorithms, including both traditional handcrafted methods and modern deep learning approaches. These algorithms ranged from the earlier HON4D and HOPC methods to more recent models such as ST-GCN and IndRNN. The experimental scenarios were designed to evaluate the algorithms' performance under different cross-subject and cross-view configurations.

Notably, the research demonstrates that handcrafted features still hold significance, particularly in cases where the datasets are small, as these methods tend to avoid overfitting. Among the handcrafted approaches, SCK+DCK achieved the highest average accuracy in cross-subject recognition, indicating its robustness in capturing complex action dynamics when trained on such datasets. Conversely, depth-based features demonstrated limitations in coping with cross-view recognition due to their sensitivity to viewpoint changes.

In contrast, deep learning approaches showed considerable promise, especially with the ability to leverage larger datasets like the NTU RGB+D. ST-GCN and IndRNN with {} features exhibited top performance on this dataset, benefiting from end-to-end learning capabilities. These findings underscore the potential of deep learning models in adapting to new and complex environments, provided good data availability.

The paper's comparison table highlights that certain algorithms, like HDG with all features, perform exceptionally well in cross-view contexts. This is indicative of the efficacy of combining depth and skeleton features to enhance recognition capabilities across varying viewpoints and occlusions, but with the caveat that noise in depth data can still hinder performance.

The implications of this research are substantial for both practical applications and theoretical advancements in AI. Practically, the findings can inform the choice of algorithms in contexts like smart surveillance, HCI, and healthcare monitoring, where robustness across diverse environments is critical. Theoretically, the nuanced insights into the strengths and limitations of current algorithms can direct future research efforts to refine feature extraction methods and deep learning architectures to overcome existing challenges in action recognition.

Looking ahead, the continuous evolution of deep learning represents a promising avenue for further improving action recognition. The ability of neural networks to learn richer representations from complex datasets positions them ideally to surpass the current state of the art, particularly by integrating robust feature selection mechanisms and domain adaptation techniques to minimize the impact of noise and occlusion.

In conclusion, this comprehensive review offers a lucid snapshot of the current landscape of Kinect-based action recognition, providing a clear trajectory for future exploration and enhancement in computer vision research domains.