Balanced Classification: A Unified Framework for Long-Tailed Object Detection (2308.02213v1)

Published 4 Aug 2023 in cs.CV

Abstract: Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories. To tackle these issues, we introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution and dynamic intensification of sample diversities in a synchronized manner. Specifically, a novel foreground classification balance loss (FCBL) is developed to ameliorate the domination of head categories and shift attention to difficult-to-differentiate categories by introducing pairwise class-aware margins and auto-adjusted weight terms, respectively. This loss prevents the over-suppression of tail categories in the context of unequal competition. Moreover, we propose a dynamic feature hallucination module (FHM), which enhances the representation of tail categories in the feature space by synthesizing hallucinated samples to introduce additional data variances. In this divide-and-conquer approach, BACL sets a new state-of-the-art on the challenging LVIS benchmark with a decoupled training pipeline, surpassing vanilla Faster R-CNN with ResNet-50-FPN by 5.8% AP and 16.1% AP for overall and tail categories. Extensive experiments demonstrate that BACL consistently achieves performance improvements across various datasets with different backbones and architectures. Code and models are available at https://github.com/Tianhao-Qi/BACL.

Citations (6)

View on Semantic Scholar

Summary

The paper introduces the BACL framework with FCBL and FHM that improves long-tailed object detection by balancing head-tail class representation.
The methodology employs adaptive class-aware margins and auto-adjusted weights in FCBL while synthesizing diverse features for tail categories using FHM.
Experimental results show a 5.8% AP improvement on LVIS v0.5, demonstrating enhanced precision for underrepresented classes.

Balanced Classification: A Unified Framework for Long-Tailed Object Detection

Introduction

The challenge of unbalanced data distribution, common in long-tailed object detection tasks, persists in degrading performance by skewing classification towards head categories. This paper introduces an innovative framework, BAlanced CLassification (BACL), which strategically rectifies these biases while enhancing data diversity for underrepresented tail categories. The approach combines a novel Foreground Classification Balance Loss (FCBL) and a Feature Hallucination Module (FHM), advancing the state-of-the-art on benchmarks like LVIS.

Balanced Classification Framework

Foreground Classification Balance Loss (FCBL):

FCBL is a key component designed to address imbalances in category competition. It achieves this by implementing a pairwise class-aware margin and auto-adjusted weight terms:

Adaptive Class-Aware Margins: These margins adaptively adjust the suppression gradients based on category dominance indicators, rectifying excessive bias towards head categories. The margin for a given category pair is dynamically computed to be logarithmically proportional to their respective dominance indicators.
Auto-Adjusted Weight Terms: These terms focus on confounding categories and those with high but incorrect classification scores, effectively shifting focus towards difficult tokens and away from already well-classified ones (Figure 1).
Figure 1: Visualization of the auto-adjusted weight in FCBL.

Feature Hallucination Module (FHM):

FHM enhances sample diversity in tail categories by synthesizing diverse feature representations:

Feature Distribution Capture: Prototypes and variances are dynamically estimated and updated using an exponential moving average, capturing category-specific feature distributions.
Hallucinated Feature Synthesis: Tailored categories are selected based on adjusted probabilities derived from long-term indicators, and synthetic features are generated using the reparametrization trick for enhanced diversity.

Implementation Details

The BACL framework operates within a decoupled training pipeline. During the representation learning stage, a generalized feature representation is acquired, which then remains frozen in the classifier learning stage. This setup ensures robust representation learning and effective bias calibration without impacting feature extraction. TensorFlow or PyTorch can be used to implement this framework, leveraging libraries such as MMDetection for customizable object detection pipelines.

Experimental Results

Experiments on LVIS v0.5 and LVIS v1.0 datasets demonstrate significant leaps in performance metrics:

LVIS v0.5: BACL achieves 27.8% AP with the ResNet-50-FPN, representing an advancement of 5.8% absolute over traditional approaches.
LVIS v1.0 and COCO-LT: Superior performance extends across various backbones and architectures, consistently improving precision on tail categories and maintaining efficacy on frequent categories (Figure 2).

Figure 2: Several qualitative results on LVIS v0.5 val set.

Conclusion

The BACL framework successfully addresses both unequal category competition and representation scarcity in tail categories. It offers a comprehensive solution to long-tailed object detection challenges, proven effective across diverse scenarios. Future developments could explore enhancing the feature extractor during the classifier learning stage or integrating complex indicators to further boost performance.

The proposed method is available at: BACL GitHub Repository.