Papers
Topics
Authors
Recent
2000 character limit reached

A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery (2104.01263v2)

Published 2 Apr 2021 in cs.CV

Abstract: Urban areas consume over two-thirds of the world's energy and account for more than 70 percent of global CO2 emissions. As stated in IPCC's Global Warming of 1.5C report, achieving carbon neutrality by 2050 requires a clear understanding of urban geometry. High-quality building footprint generation from satellite images can accelerate this predictive process and empower municipal decision-making at scale. However, previous Deep Learning-based approaches face consequential issues such as scale invariance and defective footprints, partly due to ever-present class-wise imbalance. Additionally, most approaches require supplemental data such as point cloud data, building height information, and multi-band imagery - which has limited availability and are tedious to produce. In this paper, we propose a modified DeeplabV3+ module with a Dilated Res-Net backbone to generate masks of building footprints from three-channel RGB satellite imagery only. Furthermore, we introduce an F-Beta measure in our objective function to help the model account for skewed class distributions and prevent false-positive footprints. In addition to F-Beta, we incorporate an exponentially weighted boundary loss and use a cross-dataset training strategy to further increase the quality of predictions. As a result, we achieve state-of-the-art performances across three public benchmarks and demonstrate that our RGB-only method produces higher quality visual results and is agnostic to the scale, resolution, and urban density of satellite imagery.

Citations (20)

Summary

  • The paper introduces a modified DeepLabV3+ architecture with a Dilated ResNet backbone that accurately extracts building footprints from RGB satellite imagery.
  • It incorporates the F-Beta measure and exponential weighted boundary loss to mitigate class imbalance and enhance boundary delineation.
  • Results on Urban3D, SpaceNet, and AICrowd demonstrate state-of-the-art accuracy, offering practical benefits for urban planning and policy-making.

Semantic Segmentation Network for Urban-Scale Building Footprint Extraction

Introduction

The paper "A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery" (2104.01263) addresses the problem of extracting building footprints from RGB satellite imagery using deep learning techniques. This task is critical for urban planning, energy management, and climate policy, yet it remains challenging due to issues of scale invariance and class imbalance in data. Previous methods often rely on additional costly data such as point clouds or multi-band imagery. The authors propose a novel approach using a modified DeepLabV3+ network with a Dilated ResNet backbone, capable of producing accurate building footprints using only RGB imagery. The method introduces the F-Beta measure within its objective function to handle class imbalances and leverages an exponentially weighted boundary loss for increased accuracy in delineating building edges.

Methodology

The methodology hinges on a revised DeepLabV3+ architecture with crucial modifications:

  1. Network Architecture: The standard U-Net model is replaced by DeepLabV3+ utilizing a Dilated ResNet backbone. This configuration exploits dilated convolutions to capture finer details with larger receptive fields, crucial for distinguishing buildings from their backgrounds in lower-resolution images.
  2. F-Beta Measure: Introduced as part of the objective function, this measure enables the network to focus on either precision or recall during segmentation, using a tunable parameter β\beta. Tuning β\beta is essential for managing false positives prevalent in building segmentation tasks.
  3. Exponential Weighted Boundary Loss (EWC): This loss function applies a high penalty on boundary mispredictions to improve the network's ability to distinguish between closely situated buildings. Weight maps guide the network in identifying separate entities where buildings are densely packed.
  4. Cross-Dataset Training: To strengthen the generalization capabilities, a cross-dataset training approach employs samples from various datasets. This strategy enhances the network's adaptability to different scales and densities present in satellite images from different urban areas. Figure 1

    Figure 1: Visualizations of samples from each dataset used in this paper, depicting the breadth of geographic and density variations.

Results and Performance

The proposed architecture achieved state-of-the-art results across three benchmark datasets: Urban3D, SpaceNet, and AICrowd. It outperformed several existing models, including U-Net Ensembles and Mask-RCNNs, demonstrating superior precision and recall, particularly in densely populated urban regions.

Key Results:

  • Urban3D: Demonstrated robustness with an F-1 score of 83.4 and mIOU of 84.5, highlighting significant improvements in handling scale variations and class balance without needing depth data.
  • SpaceNet: Achieved an F-1 score of 92.6, benefiting from both the high spatial resolution of Vegas-region imagery and the cross-dataset training strategy.
  • AICrowd: Excelled with an F-1 score of 96.1, emphasizing robust precision due to refined boundary delineation techniques. Figure 2

    Figure 2: Avg. accuracy performance of three runs on different values of beta on Urban3D.

Implications and Future Work

The research provides compelling implications for urban modeling applications. By reducing reliance on supplementary data, the method expands applicability to regions where only basic RGB imagery is accessible, democratizing access to advanced urban planning tools.

From a theoretical perspective, the introduction of the F-Beta measure could influence the design of loss functions in other segmentation tasks confronted with class imbalance issues. Practically, refining segmentation accuracy can directly inform municipal decision-makers and accelerate urban energy models, potentially influencing policy and strategic urban development.

Future developments could explore further optimizing the β\beta parameter across diverse datasets to achieve more adaptive network performances or integrating this method with real-time satellite data feeds for up-to-date urban footprint extraction.

Conclusion

The paper advocates a robust method for building footprint extraction using RGB satellite images, minimizing dependencies on costly supplementary datasets. By leveraging advanced architectural design and a novel objective function, the authors significantly advance the state-of-the-art in building segmentation tasks, offering practical solutions for urban planners and policy-makers engaged in energy and climate initiatives. Figure 3

Figure 3: Visualizations of predictions from all datasets by four variants of the proposed method and two baselines, highlighting qualitative improvements.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.