Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision

Published 29 Mar 2018 in cs.CV | (1803.11097v1)

Abstract: Face anti-spoofing is the crucial step to prevent face recognition systems from a security breach. Previous deep learning approaches formulate face anti-spoofing as a binary classification problem. Many of them struggle to grasp adequate spoofing cues and generalize poorly. In this paper, we argue the importance of auxiliary supervision to guide the learning toward discriminative and generalizable cues. A CNN-RNN model is learned to estimate the face depth with pixel-wise supervision, and to estimate rPPG signals with sequence-wise supervision. Then we fuse the estimated depth and rPPG to distinguish live vs. spoof faces. In addition, we introduce a new face anti-spoofing database that covers a large range of illumination, subject, and pose variations. Experimental results show that our model achieves the state-of-the-art performance on both intra-database and cross-database testing.

Abstract PDF Upgrade to Chat

Citations (538)

View on Semantic Scholar

Summary

The paper introduces a CNN-RNN architecture that integrates auxiliary supervision through depth estimation and rPPG signal extraction to distinguish live from spoofed faces.
It leverages pixel-wise supervision for spatial cues and temporal analysis for pulse detection, resulting in superior performance across cross-database evaluations with improved error metrics.
The approach enhances biometric security by setting new standards in face anti-spoofing and establishing the comprehensive Spoof in the Wild (SiW) database as a benchmark.

Overview of Deep Models for Face Anti-Spoofing with Auxiliary Supervision

Face anti-spoofing is a critical component in biometric security systems, particularly those relying on face recognition technologies. Traditionally, deep learning approaches have approached face anti-spoofing as a binary classification task, distinguishing between live and spoofed faces. However, these models often suffer from poor generalization due to an inability to capture nuanced spoofing cues. This paper presents a novel approach that incorporates auxiliary supervision in the form of depth map estimation and remote Photoplethysmography (rPPG) signals to improve the discriminative power and robustness of face anti-spoofing models.

Contributions and Methodology

The authors introduce a CNN-RNN architecture that simultaneously estimates face depth and rPPG signals from input video sequences. The depth estimation leverages pixel-wise supervision, highlighting spatial differences between live and spoofed faces. Live faces exhibit natural depth variations, unlike the flat depth profiles typically associated with spoofing attacks such as print or replay attacks. Temporally, rPPG signals offer a means of extracting pulse information, which is present in live videos but absent or distorted in spoofed ones.

To further support model training and evaluation, the team presented the Spoof in the Wild (SiW) database. This dataset offers considerable variation in terms of illumination, subject diversity, and pose, addressing the limitations of previous databases restricted in variation scope.

Experimental Results

Exhaustive experiments were conducted to evaluate the proposed model's performance, showcasing its superiority in both intra- and cross-database testing scenarios. The model set new benchmarks for face anti-spoofing, achieving strong performance metrics such as Attack Presentation Classification Error Rate (APCER), Bona Fide Presentation Classification Error Rate (BPCER), and Average Classification Error Rate (ACER). Importantly, the new CNN-RNN model demonstrated enhanced generalization capabilities, outstripping existing methods by handling cross-database scenarios effectively.

Implications and Future Directions

The use of auxiliary information presents a significant shift in the way deep learning models can approach anti-spoofing tasks. By tapping into spatial-temporal cues that inherently differentiate live presentations from spoofed attacks, this approach paves the way for more explainable and reliable face recognition systems.

Future research could explore further integration of auxiliary signals, perhaps extending beyond depth and rPPG to incorporate other biometric indicators or sensory modalities. Moreover, improvements in handling extreme variances in quality and conditions within input data remain a promising avenue for enhancing robustness against sophisticated spoofing techniques.

The introduction of the SiW database also sets a precedent for assembling comprehensive, high-variation datasets critical to advancing anti-spoofing technologies. As biometric spoofing methods evolve, the continuous update and refinement of such datasets will be crucial to maintaining the efficacy of anti-spoofing algorithms.

Conclusion

This paper makes substantial advances in face anti-spoofing by employing auxiliary supervision through depth and rPPG estimation, addressing significant issues in model generalization and capturing discriminative cues. These contributions hold practical relevance in strengthening biometric security systems, with potential ramifications across various applications in personal and enterprise security domains.

Markdown