Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment (2207.13085v3)

Published 26 Jul 2022 in cs.CV

Abstract: Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing. It is known that one-to-many assignment, assigning one ground-truth object to multiple predictions, succeeds in detection methods such as Faster R-CNN and FCOS. While the naive one-to-many assignment does not work for DETR, and it remains challenging to apply one-to-many assignment for DETR training. In this paper, we introduce Group DETR, a simple yet efficient DETR training approach that introduces a group-wise way for one-to-many assignment. This approach involves using multiple groups of object queries, conducting one-to-one assignment within each group, and performing decoder self-attention separately. It resembles data augmentation with automatically-learned object query augmentation. It is also equivalent to simultaneously training parameter-sharing networks of the same architecture, introducing more supervision and thus improving DETR training. The inference process is the same as DETR trained normally and only needs one group of queries without any architecture modification. Group DETR is versatile and is applicable to various DETR variants. The experiments show that Group DETR significantly speeds up the training convergence and improves the performance of various DETR-based models. Code will be available at \url{https://github.com/Atten4Vis/GroupDETR}.

Authors (10)

Qiang Chen (98 papers)
Xiaokang Chen (39 papers)
Jian Wang (967 papers)
Shan Zhang (84 papers)
Kun Yao (32 papers)
Haocheng Feng (33 papers)
Junyu Han (53 papers)
Errui Ding (156 papers)
Gang Zeng (40 papers)
Jingdong Wang (236 papers)

Citations (89)

View on Semantic Scholar

Summary

The paper introduces a novel group-wise one-to-many assignment mechanism for DETR that accelerates training and boosts mAP performance.
It improves model supervision by using multiple query groups to act as implicit data augmentation, enhancing decoder self-attention within each group.
Experimental results demonstrate significant gains, including a 5.0 mAP improvement on Conditional DETR-C5 with only a 12-epoch training schedule.

Overview of Group DETR: Fast Training with Group-Wise One-to-Many Assignment

The paper introduces Group DETR, an innovative approach to training Detection Transformer (DETR) models by leveraging a group-wise one-to-many assignment mechanism. This methodology effectively accelerates the training process for DETR while maintaining the integrity of its end-to-end detection capability.

Highlights of Group DETR

One-to-Many Assignment Challenge in DETR

DETR traditionally employs a one-to-one assignment strategy, associating each ground-truth object with a single prediction, effectively eliminating the need for non-maximum suppression (NMS) during post-processing. While one-to-many assignment—where one ground-truth object corresponds to multiple predictions—has been successful in models like Faster R-CNN and FCOS, applying this to DETR has been difficult without degraded performance due to duplicate predictions.

Group DETR Approach

Group DETR introduces a novel group-wise one-to-many assignment strategy. This involves utilizing multiple object query groups, assigning objects one-to-one within each group, and applying decoder self-attention independently for each group. This design achieves several benefits:

Data Augmentation Equivalent: By automatically learning object query augmentation, the model essentially performs data augmentation without introducing additional external queries.
Improved Supervision: It allows training parameter-sharing networks simultaneously across groups, enhancing supervision and consequently, improving the model's training efficiency and convergence.

Training and Inference Process

During training, Group DETR employs multiple groups of object queries, enhancing supervision and accelerating convergence. Meanwhile, inference remains identical to standard DETR, requiring only one group of queries, preserving the practical usability of the model without architectural modifications.

Experimental Results

Group DETR demonstrates significant improvements in training efficiency and prediction accuracy across multiple DETR variants. It accelerates training convergence noticeably, as evidenced by improved mAP scores:

Conditional DETR-C5 realized a mAP gain of 5.0 with just a 12-epoch training schedule.
Consistent improvement was noted in other models like Conditional DETR, DAB-DETR, and various DETR-based methods, extending to multi-view 3D object detection and instance segmentation tasks.

Theoretical and Practical Implications

The group-wise assignment methodology not only stabilizes the training process through improved decoding and supervision but also posits the potential for further exploration in augmented query applications. By introducing more trained supervision, the framework poses prospects for greater integration into diverse detection tasks and its foundational role in setting a precedent for efficient DETR training methodologies.

Future Developments

While Group DETR has shown marked efficiency in training DETR models, further exploration may focus on refining group-wise assignment strategies and expanding the versatility of DETR frameworks across other application domains, potentially transforming the landscape of transformer-based object detection systems. There lies potential in exploring deeper integrations with other self-attention mechanisms and adaptive query designs for enhanced model robustness and flexibility.

PDF Markdown

Related Papers

GitHub

GitHub - Atten4Vis/GroupDETR: [ICCV 2023] Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment (43 stars)