DOTA: A Large-scale Dataset for Object Detection in Aerial Images (1711.10398v3)

Published 28 Nov 2017 in cs.CV

Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect $2806$ aerial images from different sensors and platforms. Each image is of the size about 4000-by-4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using $15$ common object categories. The fully annotated DOTA images contains $188,282$ instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

Authors (9)

Gui-Song Xia (139 papers)
Xiang Bai (222 papers)
Jian Ding (132 papers)
Zhen Zhu (64 papers)
Serge Belongie (125 papers)
Jiebo Luo (355 papers)
Mihai Datcu (26 papers)
Marcello Pelillo (53 papers)
Liangpei Zhang (113 papers)

Citations (1,963)

View on Semantic Scholar

Summary

The paper introduces a comprehensive dataset with 2806 high-resolution aerial images and 188,282 annotated instances across 15 object categories using oriented bounding boxes.
The paper benchmarks state-of-the-art detection algorithms on both horizontal and oriented bounding boxes, revealing challenges in detecting small, densely packed, and variably oriented objects.
The dataset lays a strong foundation for advancing aerial object detection research with practical applications in remote tracking, vehicle counting, and surveillance.

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

The paper introduces "DOTA: A Large-scale Dataset for Object Detection in Aerial Images," a significant contribution to the field of object detection in Earth Vision. Object detection in aerial images is notably complex due to factors including the scale, orientation, and shape variations of objects, and a lack of well-annotated datasets. DOTA aims to address these challenges by providing a comprehensive and diverse dataset specifically designed for aerial imagery.

Dataset Composition and Annotation

DOTA comprises 2806 aerial images, each approximately 4000 x 4000 pixels, collected from various sensors and platforms. These images exhibit a wide range of object scales, orientations, and shapes. The dataset includes 15 common object categories: plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, and soccer field. Expert annotators labeled these images, resulting in 188,282 instances annotated with oriented bounding boxes (OBBs).

Key Contributions

The primary contributions of this work are twofold:

Comprehensive Dataset: DOTA is the largest annotated object dataset with a wide variety of categories in Earth Vision. Its large scale and detailed annotations make it a valuable resource for developing and evaluating object detection algorithms in aerial images.
Benchmarking: The authors benchmarked state-of-the-art object detection algorithms on DOTA, establishing baselines for future research. This benchmarking highlights the challenges of object detection in aerial imagery and provides a reference point for algorithm evaluation.

Evaluation and Baselines

The authors conducted extensive evaluations using several state-of-the-art object detection algorithms, including Faster R-CNN, R-FCN, YOLOv2, and SSD. These algorithms were evaluated on two tasks: detection on horizontal bounding boxes (HBB) and detection on oriented bounding boxes (OBB). The numerical results are summarized in the tables below:

HBB Evaluation Results (Average Precision):

Category	YOLOv2	R-FCN	Faster R-CNN	SSD
Plane	76.9	81.01	80.32	57.85
Ship	52.37	49.29	50.04	24.74
...	...	...	...	...
Average	39.2	52.58	60.46	29.86

OBB Evaluation Results (Average Precision):

| Category | YOLOv2 | R-FCN | SSD | Faster R-CNN (HBB) | Faster R-CNN (OBB) | |-|--|-|--|--|| | Plane | 52.75 | 39.57 | 41.06 | 49.74 | 79.42 | | Ship | 7.37 | 7.45 | 13.21 | 9.51 | 37.16 | | ... | ... | ... | ... | ... | ... | | Average | 25.492 | 30.84 | 17.84 | 39.95 | 54.13 |

The results indicate the challenges associated with detecting small and densely packed objects, particularly in the OBB task. The lower performance of certain categories, especially those with high orientation and size variability, highlights the need for further refinement in detection algorithms.

Implications and Future Work

DOTA provides a robust benchmark for object detection in aerial images, which has practical implications in areas such as vehicle counting, remote object tracking, and unmanned driving. The dataset's comprehensive annotations and diversity make it an invaluable resource for the development of more accurate and robust detection algorithms.

Theoretically, DOTA raises interesting questions about the generalization capabilities of object detection models across different domains. The dataset's complexity challenges existing models and exposes weaknesses in handling varied object orientations and scales.

Future developments could include the expansion of DOTA to include more categories and instances, further increasing its coverage and applicability. Additionally, improvements in algorithmic approaches, specifically aimed at addressing the challenges highlighted by DOTA, such as advanced handling of orientation and crowded instances, will be crucial.

In conclusion, DOTA represents a significant advancement in the field of object detection in Earth Vision. Its extensive dataset and rigorous benchmarking provide a solid foundation for ongoing research and development in aerial imagery object detection.

PDF Markdown