GOOD: Towards Domain Generalized Orientated Object Detection (2402.12765v2)
Abstract: Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains. Learning domain generalized oriented object detectors is particularly challenging, as the cross-domain style variation not only negatively impacts the content representation, but also leads to unreliable orientation predictions. To address these challenges, we propose a generalized oriented object detector (GOOD). After style hallucination by the emerging contrastive language-image pre-training (CLIP), it consists of two key components, namely, rotation-aware content consistency learning (RAC) and style consistency learning (SEC). The proposed RAC allows the oriented object detector to learn stable orientation representation from style-diversified samples. The proposed SEC further stabilizes the generalization ability of content representation from different image styles. Extensive experiments on multiple cross-domain settings show the state-of-the-art performance of GOOD. Source code will be publicly available.
- Bridging the gap between object and image-level representations for open-vocabulary detection. Advances in Neural Information Processing Systems, 35:33781–33794, 2022.
- A multiple-instance densely-connected convnet for aerial scene classification. IEEE Transactions on Image Processing, 29:4911–4926, 2020.
- Local semantic enhanced convnet for aerial scene recognition. IEEE Transactions on Image Processing, 30:6498–6511, 2021.
- All grains, one scheme (agos): Learning multigrain instance representation for aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, 60:1–17, 2022.
- Learning content-enhanced mask transformer for domain generalized urban-scene segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
- Learning to balance specificity and invariance for in and out of domain generalization. In European Conference on Computer Vision, pages 301–318, 2020.
- Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 182–192, 2021.
- Towards large-scale small object detection: Survey and benchmarks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20, 2023.
- Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11580–11590, 2021.
- Virtex: Learning visual representations from textual annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11162–11173, 2021.
- Learning roi transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2849–2858, 2019.
- Object detection in aerial images: A large-scale benchmark and challenges. IEEE transactions on pattern analysis and machine intelligence, 44(11):7778–7796, 2021.
- Finding beans in burgers: Deep semantic-visual embedding with localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3984–3993, 2018.
- Open-vocabulary object detection via vision and language knowledge distillation. In International Conference on Learning Representations, 2022.
- Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2021a.
- Redet: A rotation-equivariant detector for aerial object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2786–2795, 2021b.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Shape-adaptive selection and measurement for oriented object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 923–932, 2022.
- G-rep: Gaussian representation for arbitrary-oriented object detection. Remote Sensing, 15(3):757, 2023.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
- Reciprocal normalization for domain adaptation. Pattern Recognition, 140:109533, 2023.
- Bridging the domain gap towards generalization in automatic colorization. In European Conference on Computer Vision, pages 527–543, 2022.
- 3d-vfield: Adversarial augmentation of point clouds for domain generalization in 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17295–17304, 2022.
- Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975, 2022a.
- Source-free object detection by learning to overlook domain style. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8014–8023, 2022b.
- Oriented reppoints for aerial object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1829–1838, 2022c.
- Exploring plain vision transformer backbones for object detection. In European Conference on Computer Vision, pages 280–296, 2022d.
- Cross-domain adaptive teacher for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7581–7590, 2022e.
- Domain-invariant disentangled network for generalizable object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8771–8780, 2021.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE geoscience and remote sensing letters, 13(8):1074–1078, 2016.
- Domain generalization using causal matching. In International Conference on Machine Learning, pages 7313–7324, 2021.
- Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8690–8699, 2021.
- Efficient domain generalization via common-specific low-rank decomposition. In International Conference on Machine Learning, pages 7728–7738, 2020.
- Learning modulated loss for rotated object detection. In Proceedings of the AAAI conference on artificial intelligence, pages 2458–2466, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763, 2021.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Batch normalization embeddings for deep domain generalization. Pattern Recognition, 135:109115, 2023.
- Fair1m: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184:116–130, 2022.
- Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9627–9636, 2019.
- Fcos: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):1922–1933, 2020.
- Clip the gap: A single domain generalization approach for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3219–3229, 2023.
- Generalized uav object detection via frequency domain disentanglement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1064–1073, 2023a.
- Learning from extrinsic and intrinsic supervisions for domain generalization. In European Conference on Computer Vision, pages 159–176, 2020.
- Towards domain generalization for multi-view 3d object detection in bird-eye-view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13333–13342, 2023b.
- Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 847–856, 2022.
- Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3974–3983, 2018.
- Oriented r-cnn for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3520–3529, 2021.
- Arbitrary-oriented object detection with circular smooth label. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 677–694, 2020.
- Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15819–15829, 2021a.
- R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI conference on artificial intelligence, pages 3163–3171, 2021b.
- Rethinking rotated object detection with gaussian wasserstein distance loss. In International conference on machine learning, pages 11830–11841, 2021c.
- Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Advances in Neural Information Processing Systems, 34:18381–18394, 2021d.
- The kfiou loss for rotated object detection. In The Eleventh International Conference on Learning Representations, 2022.
- Sagn: Semantic-aware graph network for remote sensing scene classification. IEEE Transactions on Image Processing, 32:1011–1025, 2023.
- R22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTipoints: Pursuing rotation-insensitive point representation for aerial object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–12, 2022.
- Open-vocabulary object detection using captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14393–14402, 2021.
- Glipv2: Unifying localization and vision-language understanding. Advances in Neural Information Processing Systems, 35:36067–36080, 2022.
- Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9759–9768, 2020.
- Task-specific inconsistency alignment for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14217–14226, 2022.
- Domain generalization via entropy regularization. Advances in Neural Information Processing Systems, 33:16096–16107, 2020.
- Differential convolution feature guided deep multi-scale multiple instance learning for aerial scene classification. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4595–4599, 2021.