TOOD: Task-aligned One-stage Object Detection (2108.07755v3)

Published 17 Aug 2021 in cs.CV

Abstract: One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor. Second, we propose Task Alignment Learning (TAL) to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss. Extensive experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing. This surpasses the recent one-stage detectors by a large margin, such as ATSS (47.7 AP), GFL (48.2 AP), and PAA (49.0 AP), with fewer parameters and FLOPs. Qualitative results also demonstrate the effectiveness of TOOD for better aligning the tasks of object classification and localization. Code is available at https://github.com/fcjian/TOOD.

Citations (557)

View on Semantic Scholar

Summary

The paper proposes TOOD, a unified framework that aligns classification and localization tasks via a novel Task-aligned Head and Task Alignment Learning.
The method dynamically refines anchor assignment using a combined metric of classification scores and IoU, achieving an average precision of 51.1 on the MS-COCO dataset.
Improved task interaction reduces feature conflicts and enhances spatial precision, paving the way for efficient real-time object detection applications.

Task-aligned One-stage Object Detection (TOOD)

The paper "TOOD: Task-aligned One-stage Object Detection" introduces an innovative approach to addressing the spatial misalignment issues prevalent in one-stage object detectors. Typically, these detectors optimize object classification and localization through separate parallel branches, leading to possible discrepancies in spatial predictions. This work proposes a unified method, TOOD (Task-aligned One-stage Object Detection), which explicitly aligns these tasks with novel architectural and learning strategies.

Key Contributions

Task-aligned Head (T-Head):

The authors introduce a Task-aligned Head designed to foster better interaction between classification and localization tasks. Unlike traditional parallel heads, T-Head computes task-interactive features, thus promoting collaborative task learning. The architectural innovation lies in the Task-aligned Predictor (TAP), which employs a layer attention mechanism to dynamically compute task-specific features, optimizing the interaction and alignment of the two tasks.

Task Alignment Learning (TAL):

To address the task misalignment problem further, the paper presents Task Alignment Learning. This method uses a novel sample assignment strategy and a task-aligned loss function to ensure the optimal alignment of anchors for both tasks. TAL emphasizes training on task-aligned anchors by leveraging a new anchor alignment metric that combines classification scores and IoU values.

Methodological Insights

The T-Head architecture reduces feature conflicts and enhances task interaction by aligning spatial predictions. It does so by computing task-interactive features followed by task-specific predictions adjusted using spatial probability and offset maps. This alignment is crucial for improving the precision of joint classification and localization tasks.

TAL optimizes anchor assignment dynamically instead of traditional fixed schemes. By integrating classification and localization accuracy into the anchor alignment metric, TAL effectively refines both positive sample assignment and loss weighting.

Empirical Results

When evaluated on the MS-COCO dataset, TOOD achieves a significant performance boost with an average precision (AP) score of 51.1, markedly higher than previously established methods such as ATSS, GFL, and PAA. Notably, it achieves this with fewer computational resources, indicating efficiency alongside efficacy. The findings reveal a substantial improvement, particularly in the AP75 metric, which underscores enhanced localization precision.

Implications and Future Directions

The proposed TOOD approach not only bridges the gap between classification and localization in one-stage detectors but also provides a framework adaptable to future advancements. The architecture's plug-and-play nature suggests potential applications in diverse detection scenarios. Moreover, its efficiency may result in broader use cases, particularly in real-time applications where processing speed is critical.

The paper opens avenues for further exploration in task interaction mechanisms and the potential integration with emerging network designs. Given the alignment efficacy, future work might explore extending these concepts beyond object detection, potentially influencing fields such as object tracking and segmentation.

In conclusion, the TOOD framework delivers an effective solution to a longstanding challenge in object detection. Its design encourages a more nuanced interaction between tasks, and its impressive results on standard benchmarks indicate the value of the suggested methodological innovations.

PDF Markdown

Related Papers

GitHub

GitHub - fcjian/TOOD: TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral (319 stars)