Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining (2303.05675v1)

Published 10 Mar 2023 in cs.CV

Abstract: Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to have a general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from the aspects of both benchmark and pretraining methods. Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. To learn both coarse-grained and fine-grained knowledge in human bodies, we further propose a \textbf{P}rojector \textbf{A}ssis\textbf{T}ed \textbf{H}ierarchical pretraining method (\textbf{PATH}) to learn diverse knowledge at different granularity levels. Comprehensive evaluations on HumanBench show that our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets. The code will be publicly at \href{https://github.com/OpenGVLab/HumanBench}{https://github.com/OpenGVLab/HumanBench}.

Citations (25)

Summary

  • The paper introduces PATH, a projector assisted hierarchical pre-training method that effectively improves model generalization across 19 datasets and 6 tasks.
  • It consolidates diverse human-centric data to reduce computational redundancies and lower deployment costs by enabling a universal pre-trained model.
  • Comprehensive evaluations reveal state-of-the-art performance on 17 out of 19 datasets, underscoring the practical impact of the proposed approach.

Overview of HumanBench: A Benchmark for Human-centric Perception with Projector Assisted Pretraining

The paper introduces HumanBench, a benchmark designed to address the complexities and evaluation of human-centric perception models across various vision tasks. It offers a comprehensive framework for pre-training and evaluating machine learning models on diverse datasets, specifically focusing on human-centric perception, which includes tasks such as person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.

HumanBench: Comprehensive Benchmark Design

HumanBench presents a methodical approach to evaluating the efficacy of pre-training models on 19 datasets from six distinct tasks. It focuses on assessing the generalization ability of these models, addressing a significant gap in machine learning research where human-centric tasks often remain isolated, resulting in computational redundancies and inflated deployment costs. HumanBench leverages large volumes of human-centric data, consolidating information across various datasets to facilitate the development of a universal pre-trained model applicable to multiple downstream tasks.

Projector Assisted Hierarchical Pretraining (PATH)

The paper introduces PATH, a pre-training methodology which leverages hierarchical weight sharing with task-specific projectors to effectively capture multi-scale human-centric features. This method innovatively organizes weight sharing across task-specific and dataset-specific domains, thereby minimizing task conflicts—a common issue in multidisciplinary pre-training models. PATH notably excels by learning both coarse-grained and fine-grained human-centric information, improving the model's adaptability to diverse downstream applications.

Numerical Results and State-of-the-art Achievements

Comprehensive evaluations indicate that PATH achieves new state-of-the-art performance in 17 out of 19 downstream datasets and maintains competitive performance in the remaining tasks. These outcomes highlight the model's superior capability in extracting relevant features for human-centric perception tasks. Such results are significant, demonstrating how the model outperforms existing pre-trained models such as MAE and CLIP, particularly in contexts where human-centric features are paramount.

Implications and Future Directions

The HumanBench paper offers several implications for both practical applications and theoretical advancement in AI and computer vision:

  1. Efficiency in Model Development: By enabling a general pre-trained model to be effectively applied across a wide range of human-centric tasks, HumanBench can markedly reduce the computational burden involved in developing task-specific models.
  2. Improved Real-world Deployment: The ability of PATH-enhanced models to perform at par or better across multiple datasets means they can be deployed in various application domains more seamlessly.
  3. Alterations in Benchmarking Standards: HumanBench could drive the future of AI development towards establishing similar benchmarks in other domains, promoting generalization and efficiency.
  4. Future Research Directions: The success of the PATH method suggests several avenues for future exploration, including the refinement of projector modules, further paper into weight-sharing strategies, and the expansion of the framework into including other perceptual domains, such as audio or textual human-centric data.

In conclusion, HumanBench and the PATH methodology represent a significant stride toward efficient and generalizable human-centric perception systems. The methodology not only advances our capability to build robust pre-training models but also sets new benchmarks for future research and application in AI-driven human-centered task domains.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com