Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 30 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Gait Recognition via Disentangled Representation Learning (1904.04925v1)

Published 9 Apr 2019 in cs.CV

Abstract: Gait, the walking pattern of individuals, is one of the most important biometrics modalities. Most of the existing gait recognition methods take silhouettes or articulated body models as the gait features. These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, carrying and view angle. To remedy this issue, we propose a novel AutoEncoder framework to explicitly disentangle pose and appearance features from RGB imagery and the LSTM-based integration of pose features over time produces the gait feature. In addition, we collect a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since it contains minimal gait cues compared to other views. FVG also includes other important variations, e.g., walking speed, carrying, and clothing. With extensive experiments on CASIA-B, USF and FVG datasets, our method demonstrates superior performance to the state of the arts quantitatively, the ability of feature disentanglement qualitatively, and promising computational efficiency.

Citations (214)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a novel GaitNet model that disentangles pose and appearance features using an autoencoder framework with specialized loss functions.
  • It employs an LSTM to aggregate temporal pose data, achieving state-of-the-art performance on CASIA-B, USF, and the new FVG dataset with improved runtime efficiency.
  • The approach paves the way for future applications in vision tasks like facial expression and activity recognition by delivering robust, invariant feature extraction.

Gait Recognition via Disentangled Representation Learning

The paper "Gait Recognition via Disentangled Representation Learning" introduces a novel approach for improving gait recognition by effectively disentangling pose and appearance features from RGB imagery. This approach is particularly significant as it addresses the limitations of existing gait recognition methods that rely on silhouettes or articulated body models, which often suffer from reduced performance under variations like clothing, carrying conditions, and different view angles.

The cornerstone of the proposed methodology is the employment of a deep learning model named GaitNet, which leverages an autoencoder framework to achieve feature disentanglement. The encoder of the autoencoder divides the features of each frame into two latent representations: pose and appearance features. The disentanglement is enforced through a set of loss functions, namely the cross-reconstruction loss and gait similarity loss. The cross-reconstruction loss is designed to ensure that the appearance features from one frame, combined with the pose features of another, should reproduce the target frame. The gait similarity loss, on the other hand, maintains the consistency of gait features across different conditions for the same individual.

The innovation in GaitNet is furthered by its integration with a Long Short-Term Memory (LSTM) network that aggregates pose features over time to construct the final gait feature representation. This temporal modeling is crucial in capturing the dynamic aspects of an individual's walking pattern, which are essential for recognition tasks.

The researchers also introduce a new dataset named the Frontal-View Gait (FVG) dataset, which was specifically collected to focus on the challenging task of frontal-view gait recognition. The dataset contains considerable variations, such as walking speed, carrying, and clothing, which are captured from multiple frontal-view angles. The addition of this dataset is pivotal for evaluating gait recognition systems under conditions that are prevalent in real-world surveillance scenarios.

Quantitative results demonstrate that GaitNet outperforms existing state-of-the-art methods on multiple benchmarks including the CASIA-B, USF, and the newly introduced FVG datasets. The method displays robust performance under challenging variations and shows significant promise in computational efficiency, with faster runtime performance compared to certain alternative methods.

The theoretical implications of this research suggest a path forward for disentangling representations in other vision tasks, potentially extending to facial expression recognition and activity recognition, where motion dynamics are crucial yet are often confounded by other factors.

In terms of potential future development, the blend of disentangled representation learning and temporal feature aggregation could expand, benefiting related domains involving video data, and harnessing similar methodologies for other biometric modalities. This approach anticipates progress toward a broader application of deep learning models for robust and invariant feature extraction, mitigating the influences of varying external conditions.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube