- The paper introduces a comprehensive dataset with 2806 high-resolution aerial images and 188,282 annotated instances across 15 object categories using oriented bounding boxes.
- The paper benchmarks state-of-the-art detection algorithms on both horizontal and oriented bounding boxes, revealing challenges in detecting small, densely packed, and variably oriented objects.
- The dataset lays a strong foundation for advancing aerial object detection research with practical applications in remote tracking, vehicle counting, and surveillance.
DOTA: A Large-scale Dataset for Object Detection in Aerial Images
The paper introduces "DOTA: A Large-scale Dataset for Object Detection in Aerial Images," a significant contribution to the field of object detection in Earth Vision. Object detection in aerial images is notably complex due to factors including the scale, orientation, and shape variations of objects, and a lack of well-annotated datasets. DOTA aims to address these challenges by providing a comprehensive and diverse dataset specifically designed for aerial imagery.
Dataset Composition and Annotation
DOTA comprises 2806 aerial images, each approximately 4000 x 4000 pixels, collected from various sensors and platforms. These images exhibit a wide range of object scales, orientations, and shapes. The dataset includes 15 common object categories: plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, and soccer field. Expert annotators labeled these images, resulting in 188,282 instances annotated with oriented bounding boxes (OBBs).
Key Contributions
The primary contributions of this work are twofold:
- Comprehensive Dataset: DOTA is the largest annotated object dataset with a wide variety of categories in Earth Vision. Its large scale and detailed annotations make it a valuable resource for developing and evaluating object detection algorithms in aerial images.
- Benchmarking: The authors benchmarked state-of-the-art object detection algorithms on DOTA, establishing baselines for future research. This benchmarking highlights the challenges of object detection in aerial imagery and provides a reference point for algorithm evaluation.
Evaluation and Baselines
The authors conducted extensive evaluations using several state-of-the-art object detection algorithms, including Faster R-CNN, R-FCN, YOLOv2, and SSD. These algorithms were evaluated on two tasks: detection on horizontal bounding boxes (HBB) and detection on oriented bounding boxes (OBB). The numerical results are summarized in the tables below:
HBB Evaluation Results (Average Precision):
Category |
YOLOv2 |
R-FCN |
Faster R-CNN |
SSD |
Plane |
76.9 |
81.01 |
80.32 |
57.85 |
Ship |
52.37 |
49.29 |
50.04 |
24.74 |
... |
... |
... |
... |
... |
Average |
39.2 |
52.58 |
60.46 |
29.86 |
OBB Evaluation Results (Average Precision):
| Category | YOLOv2 | R-FCN | SSD | Faster R-CNN (HBB) | Faster R-CNN (OBB) |
|-|--|-|--|--||
| Plane | 52.75 | 39.57 | 41.06 | 49.74 | 79.42 |
| Ship | 7.37 | 7.45 | 13.21 | 9.51 | 37.16 |
| ... | ... | ... | ... | ... | ... |
| Average | 25.492 | 30.84 | 17.84 | 39.95 | 54.13 |
The results indicate the challenges associated with detecting small and densely packed objects, particularly in the OBB task. The lower performance of certain categories, especially those with high orientation and size variability, highlights the need for further refinement in detection algorithms.
Implications and Future Work
DOTA provides a robust benchmark for object detection in aerial images, which has practical implications in areas such as vehicle counting, remote object tracking, and unmanned driving. The dataset's comprehensive annotations and diversity make it an invaluable resource for the development of more accurate and robust detection algorithms.
Theoretically, DOTA raises interesting questions about the generalization capabilities of object detection models across different domains. The dataset's complexity challenges existing models and exposes weaknesses in handling varied object orientations and scales.
Future developments could include the expansion of DOTA to include more categories and instances, further increasing its coverage and applicability. Additionally, improvements in algorithmic approaches, specifically aimed at addressing the challenges highlighted by DOTA, such as advanced handling of orientation and crowded instances, will be crucial.
In conclusion, DOTA represents a significant advancement in the field of object detection in Earth Vision. Its extensive dataset and rigorous benchmarking provide a solid foundation for ongoing research and development in aerial imagery object detection.