detrex: Benchmarking Detection Transformers (2306.07265v2)

Published 12 Jun 2023 in cs.CV

Abstract: The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection and other perception tasks. However, the current field lacks a unified and comprehensive benchmark specifically tailored for DETR-based models. To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation. We conduct extensive experiments under detrex and perform a comprehensive benchmark for DETR-based models. Moreover, we enhance the performance of detection transformers through the refinement of training hyper-parameters, providing strong baselines for supported algorithms.We hope that detrex could offer research communities a standardized and unified platform to evaluate and compare different DETR-based models while fostering a deeper understanding and driving advancements in DETR-based instance recognition. Our code is available at https://github.com/IDEA-Research/detrex. The project is currently being actively developed. We encourage the community to use detrex codebase for further development and contributions.

References (54)

Authors (16)

Tianhe Ren (25 papers)
Shilong Liu (60 papers)
Feng Li (286 papers)
Hao Zhang (948 papers)
Ailing Zeng (58 papers)
Jie Yang (516 papers)
Xingyu Liao (18 papers)
Ding Jia (35 papers)
Hongyang Li (99 papers)
He Cao (18 papers)
Jianan Wang (44 papers)
Zhaoyang Zeng (29 papers)
Xianbiao Qi (38 papers)
Yuhui Yuan (42 papers)
Jianwei Yang (93 papers)
Lei Zhang (1689 papers)

Citations (9)

View on Semantic Scholar

Summary

The paper introduces detrex as a unified, modular framework that standardizes evaluation for DETR-based algorithms.
It benchmarks various DETR models by refining training hyper-parameters and achieving performance gains up to 1.1 AP.
The platform’s flexibility facilitates reproducible comparisons, advancing research in object detection, segmentation, and pose estimation.

An Overview of detrex: Benchmarking Detection Transformers

The paper "detrex: Benchmarking Detection Transformers" addresses a pronounced gap in the field of computer vision related to the DEtection TRansformer (DETR) models. With DETR's increasing prominence in object detection and perception tasks, there exists a paucity of comprehensive and unified benchmarks tailored specifically for these models. To address this, the authors introduce a solution called "detrex," offering a modular, lightweight codebase for DETR-based algorithms.

Key Contributions

Unified and Modular Codebase

detrex is engineered to support an extensive range of DETR-based instance recognition algorithms. It covers foundational tasks such as object detection, segmentation, and pose estimation. The authors emphasize the platform's flexibility, which allows for easy adjustment of configurations and model structures. This modularity facilitates the integration and evaluation of various models under consistent conditions, thus ensuring reproducible and fair comparisons.

Comprehensive Benchmarking

The authors conducted thorough benchmarking of DETR-based models within detrex, presenting empirical evaluations across critical aspects such as model performance, training and inference efficiency, and the influence of different backbones and components on performance. This comprehensive evaluation offers robust baselines for future research.

Performance Optimization

By refining training hyper-parameters, the paper reports achieving significant performance improvements on supported algorithms, with improvements ranging from 0.2 AP to 1.1 AP. Notably, the enhancements also underscore the potential of Non-Maximum Suppression (NMS) in DETR variants, adding an extra layer of refinement.

Detailed Evaluation

The research systematically evaluates diverse DETR-based models, demonstrating detrex’s efficacy in boosting and reproducing results from various state-of-the-art architectures. The benchmarking features both recent CNN-based backbones and vision transformer models, indicating the platform's applicability across different model types.

Implications for Research and Practice

For researchers, detrex offers a standardized framework that aids in the systemic evaluation and comparison of DETR-based algorithms. It serves as a tool to dissect and understand how different components and configurations influence the performance of detection transformers.

Practically, the platform facilitates the application of DETR-based models in real-world scenarios by providing optimized, reproducible baselines. Industry practitioners can leverage detrex to streamline the deployment of object detection systems across varied tasks like segmentation and pose estimation.

Future Directions

The active development of detrex, combined with its open-source accessibility, suggests potential for ongoing enhancements and contributions from the community. Future development may involve integrating more algorithms and expanding the scope of supported tasks, thus fostering progress in the field of detection transformers.

Conclusion

detrex sets out to fill the critical need for a unified benchmarking platform in the field of DETR-based models. Through its modular design and comprehensive evaluations, it offers valuable insights for the research community, thereby supporting the advancement of detection transformers in both theoretical exploration and practical application. The codebase's modularity and extensibility suggest promising avenues for future developments in AI, specifically in the domain of visual perception tasks.

PDF Markdown

Related Papers

GitHub

GitHub - IDEA-Research/detrex: detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks. (1,970 stars)