DSFD: Dual Shot Face Detector (1810.10220v3)

Published 24 Oct 2018 in cs.CV

Abstract: In this paper, we propose a novel face detection network with three novel contributions that address three key aspects of face detection, including better feature learning, progressive loss design and anchor assign based data augmentation, respectively. First, we propose a Feature Enhance Module (FEM) for enhancing the original feature maps to extend the single shot detector to dual shot detector. Second, we adopt Progressive Anchor Loss (PAL) computed by two different sets of anchors to effectively facilitate the features. Third, we use an Improved Anchor Matching (IAM) by integrating novel anchor assign strategy into data augmentation to provide better initialization for the regressor. Since these techniques are all related to the two-stream design, we name the proposed network as Dual Shot Face Detector (DSFD). Extensive experiments on popular benchmarks, WIDER FACE and FDDB, demonstrate the superiority of DSFD over the state-of-the-art face detectors.

Citations (383)

View on Semantic Scholar

Summary

The paper introduces DSFD, a dual shot framework integrating a Feature Enhance Module, Progressive Anchor Loss, and Improved Anchor Matching for robust face detection.
The method employs multi-level dilated convolutions and progressive anchor adjustment to effectively handle scale variations and class imbalance.
Experimental results on WIDER FACE and FDDB demonstrate high accuracy with AP scores of 96.6%, 95.7%, and 90.4% across different difficulty levels.

An Overview of DSFD: Dual Shot Face Detector

The "DSFD: Dual Shot Face Detector" paper introduces an advanced approach to face detection, focusing on enhancing feature learning, loss design, and anchor assignment strategies. The method, termed Dual Shot Face Detector (DSFD), is structured around three innovative components: the Feature Enhance Module (FEM), Progressive Anchor Loss (PAL), and Improved Anchor Matching (IAM).

Key Contributions and Methodology

Feature Enhance Module (FEM): The FEM is integral to improving the robustness and discriminability of feature maps. Unlike traditional Feature Pyramid Networks (FPN), which merely aggregate information from hierarchical layers, FEM incorporates multi-level dilated convolutions to intensify feature learning. This module utilizes different dimensions of information, including current and neighboring layer neuron cells, providing enhanced semantic features for better face detection across varied scales and conditions.
Progressive Anchor Loss (PAL): PAL introduces a novel multi-task loss that progressively adjusts anchor sizes not only across different layers but also across two detection shots within the framework. Smaller anchors are strategically employed in the initial detection layer (first shot) to capture intricate details of smaller face instances, while larger anchors in the subsequent layer (second shot) enhance detection reliability and confidence. This progressive approach addresses the inherent class imbalance issues by better utilizing hierarchical features for effective face detection.
Improved Anchor Matching (IAM): To optimize anchor assignment, IAM integrates a refined strategy that counterbalances the scale variance of faces against the fixed anchor scales. This involves spatial size adjustments during data augmentation and setting an IoU threshold of 0.4 to increase anchor distribution consistency, thereby improving regression initialization and overall detection rate.

Experimental Results

DSFD showcases its efficacy through extensive testing on WIDER FACE and FDDB datasets. The model achieves notable performance, with AP results standing at 96.6% for Easy, 95.7% for Medium, and 90.4% for Hard subsets on the WIDER FACE validation set. Performance is equally impressive on the FDDB dataset, yielding competitive ROC scores against existing state-of-the-art methods. These results affirm DSFD's capability in handling faces with significant variations in scale, pose, occlusion, and other challenging conditions.

Implications and Future Prospects

The DSFD framework not only advances the technical architecture for face detection but also lays the groundwork for further enhancements in real-time applications, particularly those requiring high accuracy and low latency. The novel loss function and anchor matching strategies could be adapted or extended to other detection tasks involving diverse object scales and conditions. Moreover, the efficient utilization of computational resources, such as omitting first-shot outputs during inference, highlights potential for optimizing model deployment in resource-constrained environments.

Conclusion

The introduction of DSFD marks a significant improvement in face detection paradigms, tackling long-standing challenges through methodical advancements in model architecture and learning strategies. The practicality and adaptability of its components suggest promising avenues for future research and application in artificial intelligence, particularly within domains relying on precise and robust facial recognition capabilities. As technology progresses, further exploration into optimizing detection models must consider the balance between computational cost, inference speed, and detection accuracy—key aspects exemplified by the DSFD approach.

PDF Markdown