Clustered Object Detection in Aerial Images (1904.08008v3)

Published 16 Apr 2019 in cs.CV

Abstract: Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in pixels, making them hardly distinguished from surrounding background; and (2) targets are in general sparsely and non-uniformly distributed, making the detection very inefficient. In this paper, we address both issues inspired by observing that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object clustering and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces object cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region is fed into DetecNet for object detection. ClusDet has several advantages over previous solutions: (1) it greatly reduces the number of chips for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three popular aerial image datasets including VisDrone, UAVDT and DOTA. In all experiments, ClusDet achieves promising performance in comparison with state-of-the-art detectors. Code will be available in \url{https://github.com/fyangneil}.

Citations (243)

View on Semantic Scholar

Summary

The paper introduces ClusDet, a network that integrates cluster proposal, scale estimation, and detection to efficiently improve small object detection in aerial images.
The methodology reduces processing costs by focusing on clustered regions and using ScaleNet to refine object scales for enhanced accuracy.
Experimental results on VisDrone, UAVDT, and DOTA datasets show improved average precision, outperforming traditional detectors like Faster R-CNN.

Clustered Object Detection in Aerial Images

The paper "Clustered Object Detection in Aerial Images" tackles the issues inherent in detecting objects in aerial imagery, where the challenges are primarily characterized by small object size and sparse, non-uniform distribution. Such characteristics complicate both the accuracy and efficiency of object detection processes. The proposed solution involves the development of a novel network architecture named ClusDet, which integrates object clustering and detection in an end-to-end framework more efficiently.

Key Components of the ClusDet Network

ClusDet is composed of three primary sub-networks:

Cluster Proposal Sub-network (CPNet): This network component predicts object cluster regions, effectively reducing the number of regions that need to be processed for object detection. This reduction in regions not only enhances computational efficiency but also exploits the clustered nature of targets within aerial images.
Scale Estimation Sub-network (ScaleNet): This component estimates the scale of objects within identified clusters, allowing for better handling of small-scale objects relative to the large image sizes often found in aerial datasets. This is crucial in ensuring that the objects maintain an appropriate scale in the detector's input space, improving the detector's performance.
Dedicated Detection Network (DetecNet): Designed specifically for managing clustered regions, this network leverages the context within clusters to boost detection accuracy.

Experimental Validation and Performance

The proposed ClusDet method was tested on three prominent aerial image datasets: VisDrone, UAVDT, and DOTA. Across all datasets, ClusDet demonstrated promising results when compared to state-of-the-art detectors. The paper highlights several performance enhancements:

Efficiency: ClusDet significantly reduces the computational costs by limiting the focus to clustered regions, thereby reducing the number of chips processed.
Accuracy: The integration of a cluster-based scale estimation (via ScaleNet) improves the object detection accuracy, particularly for small objects subjected to extreme down-scaling in conventional approaches.

For instance, in experiments on the VisDrone dataset, ClusDet was able to outperform traditional methods like Faster R-CNN with the Feature Pyramid Network across various backbone architectures, showcasing both higher Average Precision (AP) and improved detection rates for small and mid-sized objects.

Implications and Future Work

The research provides substantive contributions to the field of aerial object detection by addressing specific challenges related to the aerial domain. By focusing detection efforts on clustered regions and refining scale handling, ClusDet opens the avenue for more efficient processing of high-resolution aerial imagery. Practically, this might lead to better resource allocation in real-time surveillance systems, which require efficient data processing capabilities.

Theoretically, ClusDet points towards a further exploration of context-based and cluster-aware detection approaches in other domains beyond aerial photography. Future directions could explore the integration of temporal data, consider dynamic applications in video surveillance, or extend cluster-based methodologies to other multi-scale and high-resolution imaging challenges. Moreover, the development of more sophisticated scale estimation algorithms and enhanced feature integration strategies in clustered environments could refine the capabilities of such systems further.

In conclusion, "Clustered Object Detection in Aerial Images" introduces a robust framework that strategically manages both the efficiency and efficacy of object detection tasks in aerial images, presenting a noteworthy step forward in the domain's capacity to process complex imagery.

PDF Markdown