Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery (1807.02700v3)

Published 7 Jul 2018 in cs.CV

Abstract: Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (+6% and +12% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.

Citations (286)

View on Semantic Scholar

Summary

The paper presents a novel multi-scale framework combining image cascade, FPN, and deformable inception networks to enhance object detection in remote sensing imagery.
It employs rotational region proposal and region of interest networks with an innovative loss function to accurately detect objects with arbitrary orientations.
Empirical results on DOTA, NWPU VHR-10, and UCAS-AOD datasets show significant mAP gains, demonstrating the method's practical effectiveness.

Analyzing Multi-class Object Detection in Unconstrained Remote Sensing Imagery

The paper "Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery" proposes a novel approach to address the complex challenges of detecting various objects in remote sensing (RS) imagery. This task is notoriously challenging due to factors like diverse object scales, arbitrary orientations, multiple categories, and complex backgrounds. Traditional methods often struggle with these issues, particularly when the images are acquired through different sensors and modalities.

Overview of the Proposed Method

The authors introduce an innovative methodology that leverages a joint image cascade and feature pyramid network (FPN) with multi-size convolution kernels. This method aims to extract both strong and weak semantic features across multiple scales, which are crucial for precise object detection in RS images. The framework is enhanced by a deformable inception network (DIN), which uses deformable convolutions to improve localization properties, particularly for small objects prevalent in RS data.

The architecture also includes rotational region proposal networks (R-RPN) and regions of interest (R-ROI) networks that utilize multi-scale strategies to efficiently propose rotated bounding boxes, enhancing the detection of objects with arbitrary orientations. The authors implement a rotational non-maximum suppression (R-NMS) step to refine detection results by eliminating redundant detections.

An innovative aspect of the methodology is the use of a novel loss function that enforces quadrilateral bounding boxes to form rectangles, a constraint reflecting the often rectangular nature of annotated objects in RS imagery.

Empirical Evaluation and Results

The proposed model is evaluated on the DOTA, NWPU VHR-10, and UCAS-AOD datasets, known for their complexity and variety. According to the results, the approach achieves significant performance improvements over the state-of-the-art methods. Specifically, the method yields a mean average precision (mAP) of 68.16% for horizontal bounding box detection and 72.45% for oriented bounding box detection on the DOTA dataset. These results show substantial positive deviations from existing benchmarks, demonstrating the approach's efficacy in multi-class object detection.

The ablation studies detail the individual contributions of various components like the image cascade, the influence of the backbone network (ResNet-50/101, ResNeXt-101), and deformable convolutions, all contributing to the robust performance of the model. For instance, employing DIN in conjunction with deformable convolutions enhances the mAP by over 2%, with further gains seen when applying corrections for bounding box rectangularity.

Theoretical and Practical Implications

This research presents a substantial contribution to the domain of satellite and aerial image analysis by offering a robust solution capable of handling diverse imaging scenarios. The dual capability of dealing with horizontal and oriented bounding boxes makes it adaptable for complex environments where objects do not conform to standard orientations and sizes.

From a practical standpoint, the approach has broad implications for applications such as urban planning, disaster management, and vehicle monitoring in aerial imagery. The ability to generalize across datasets acquired from different sensors and conditions underlines the model's versatility and practical applicability in real-world scenarios.

Speculation on Future Developments

Future research could extend this model to incorporate more advanced techniques in domain adaptation to handle RS images acquired across even broader spectrums of sensor modalities and geographic regions. Integration with time-series datasets could also be explored for dynamic tracking applications. Additionally, advancements in unsupervised learning could provide avenues for reducing the dependency on extensive labeled datasets, a current limitation in the RS domain.

In conclusion, this paper provides an effective and sophisticated solution for multi-class object detection in RS imagery, setting a foundation for future work in the intersection of deep learning and remote sensing technologies.

PDF Markdown