HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining (2303.05675v1)

Published 10 Mar 2023 in cs.CV

Abstract: Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to have a general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from the aspects of both benchmark and pretraining methods. Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. To learn both coarse-grained and fine-grained knowledge in human bodies, we further propose a \textbf{P}rojector \textbf{A}ssis\textbf{T}ed \textbf{H}ierarchical pretraining method (\textbf{PATH}) to learn diverse knowledge at different granularity levels. Comprehensive evaluations on HumanBench show that our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets. The code will be publicly at \href{https://github.com/OpenGVLab/HumanBench}{https://github.com/OpenGVLab/HumanBench}.

Citations (25)

View on Semantic Scholar

Summary

The paper introduces PATH, a projector assisted hierarchical pre-training method that effectively improves model generalization across 19 datasets and 6 tasks.
It consolidates diverse human-centric data to reduce computational redundancies and lower deployment costs by enabling a universal pre-trained model.
Comprehensive evaluations reveal state-of-the-art performance on 17 out of 19 datasets, underscoring the practical impact of the proposed approach.

Overview of HumanBench: A Benchmark for Human-centric Perception with Projector Assisted Pretraining

The paper introduces HumanBench, a benchmark designed to address the complexities and evaluation of human-centric perception models across various vision tasks. It offers a comprehensive framework for pre-training and evaluating machine learning models on diverse datasets, specifically focusing on human-centric perception, which includes tasks such as person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.

HumanBench: Comprehensive Benchmark Design

HumanBench presents a methodical approach to evaluating the efficacy of pre-training models on 19 datasets from six distinct tasks. It focuses on assessing the generalization ability of these models, addressing a significant gap in machine learning research where human-centric tasks often remain isolated, resulting in computational redundancies and inflated deployment costs. HumanBench leverages large volumes of human-centric data, consolidating information across various datasets to facilitate the development of a universal pre-trained model applicable to multiple downstream tasks.

Projector Assisted Hierarchical Pretraining (PATH)

The paper introduces PATH, a pre-training methodology which leverages hierarchical weight sharing with task-specific projectors to effectively capture multi-scale human-centric features. This method innovatively organizes weight sharing across task-specific and dataset-specific domains, thereby minimizing task conflicts—a common issue in multidisciplinary pre-training models. PATH notably excels by learning both coarse-grained and fine-grained human-centric information, improving the model's adaptability to diverse downstream applications.

Numerical Results and State-of-the-art Achievements

Comprehensive evaluations indicate that PATH achieves new state-of-the-art performance in 17 out of 19 downstream datasets and maintains competitive performance in the remaining tasks. These outcomes highlight the model's superior capability in extracting relevant features for human-centric perception tasks. Such results are significant, demonstrating how the model outperforms existing pre-trained models such as MAE and CLIP, particularly in contexts where human-centric features are paramount.

Implications and Future Directions

The HumanBench paper offers several implications for both practical applications and theoretical advancement in AI and computer vision:

Efficiency in Model Development: By enabling a general pre-trained model to be effectively applied across a wide range of human-centric tasks, HumanBench can markedly reduce the computational burden involved in developing task-specific models.
Improved Real-world Deployment: The ability of PATH-enhanced models to perform at par or better across multiple datasets means they can be deployed in various application domains more seamlessly.
Alterations in Benchmarking Standards: HumanBench could drive the future of AI development towards establishing similar benchmarks in other domains, promoting generalization and efficiency.
Future Research Directions: The success of the PATH method suggests several avenues for future exploration, including the refinement of projector modules, further paper into weight-sharing strategies, and the expansion of the framework into including other perceptual domains, such as audio or textual human-centric data.

In conclusion, HumanBench and the PATH methodology represent a significant stride toward efficient and generalizable human-centric perception systems. The methodology not only advances our capability to build robust pre-training models but also sets new benchmarks for future research and application in AI-driven human-centered task domains.