YOLO5Face: Why Reinventing a Face Detector (2105.12931v3)

Published 27 May 2021 in cs.CV

Abstract: Tremendous progress has been made on face detection in recent years using convolutional neural networks. While many face detectors use designs designated for detecting faces, we treat face detection as a generic object detection task. We implement a face detector based on the YOLOv5 object detector and call it YOLO5Face. We make a few key modifications to the YOLOv5 and optimize it for face detection. These modifications include adding a five-point landmark regression head, using a stem block at the input of the backbone, using smaller-size kernels in the SPP, and adding a P6 output in the PAN block. We design detectors of different model sizes, from an extra-large model to achieve the best performance to a super small model for real-time detection on an embedded or mobile device. Experiment results on the WiderFace dataset show that on VGA images, our face detectors can achieve state-of-the-art performance in almost all the Easy, Medium, and Hard subsets, exceeding the more complex designated face detectors. The code is available at \url{https://github.com/deepcam-cn/yolov5-face}

Authors (4)

Delong Qi (3 papers)
Weijun Tan (11 papers)
Qi Yao (39 papers)
Jingfeng Liu (18 papers)

Citations (143)

View on Semantic Scholar

Summary

YOLO5Face: A Comprehensive Review

The paper "YOLO5Face: Why Reinventing a Face Detector" introduces a novel approach to face detection by leveraging the YOLOv5 object detection framework. The authors propose significant modifications tailored for face detection, culminating in what they term YOLO5Face. This approach aligns face detection with general object detection by effectively integrating a landmark regression head and employing the Wing loss function.

Methodological Overview

At the core of YOLO5Face is the adaptation of YOLOv5, incorporating a five-point landmark regression head to enhance facial landmark accuracy. The paper details a spectrum of model sizes designed to balance performance and computational efficiency across diverse applications. By employing ShuffleNetV2 as an alternative backbone for smaller models, YOLO5Face can operate effectively on embedded or mobile devices, achieving state-of-the-art (SOTA) performance even under constrained computational resources.

Key Modifications

Several significant architectural modifications underpin the transition from YOLOv5 to YOLO5Face:

Landmark Regression: Integration of a landmark regression head, using Wing loss, enhances the detection of facial key points.
Network Enhancements: Replacement of the Focus layer with a Stem block improves generalization and reduces complexity.
SPP and P6 Blocks: Refinements in the SPP block through reduced kernel sizes and the addition of a P6 block bolster the detection of various face sizes.
Data Augmentation Adjustments: Customizations in data augmentation strategies facilitate better detection performance, notably by excluding certain augmentations deemed less effective for face detection, like up-down flipping.

Experimental Evaluation

The results, derived using the WiderFace dataset, demonstrate YOLO5Face's effectiveness across Easy, Medium, and Hard subsets. It consistently exhibits superior or equivalent performance to existing models, marked by higher mean average precision (mAP) scores. The substantial improvements in the Hard subset establish YOLO5Face’s robustness in challenging detection scenarios characterized by occlusions and diverse scales.

In tabled comparisons with prevalent models such as DSFD and RetinaFace, YOLO5Face models—ranging from large to ultra-light—manifest impressive trade-offs between accuracy and speed. The flexible architecture supports various deployment needs, from high-performance environments to resource-limited mobile applications, emphasizing its practical utility.

Implications and Future Directions

YOLO5Face’s impressive results affirm the paper’s assertion that face detection is a subset of general object detection tasks. The deployment of models across different scales highlights its adaptability, advocating for broader implementation in technology addressing real-time requirements, such as surveillance and mobile applications.

The integration of the landmark regression framework extends its utility beyond detection, making it instrumental in tasks that require precise facial alignments, such as face recognition. Moreover, the open-sourcing of this architecture invites further community-driven enhancements and explorations into cross-domain adaptability, as demonstrated with evaluations on datasets like FDDB.

Future directions may explore optimizing the balance between detection speed and accuracy further, leveraging cutting-edge architectures like transformers or extending adaptations to handle more complex environments with increased occlusions and varied facial expressions.

In conclusion, YOLO5Face asserts itself as a potent alternative in the face detection landscape, using minimalistic yet sophisticated modifications to a proven object detection framework. This approach exemplifies efficient reuse and specialization of existing architectures, hinting at broader applications within AI-driven visual recognition systems.

PDF Markdown

Related Papers

GitHub

GitHub - deepcam-cn/yolov5-face: YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931) ECCV Workshops 2022) (2,093 stars)