DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

Published 3 Apr 2019 in cs.CV | (1904.02216v1)

Abstract: This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8$\times$ less FLOPs and 2$\times$ faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a speed of 160 FPS on one NVIDIA Titan X card, and 71.3\% Mean IOU with 3.4 GFLOPs while inferring on a higher resolution image.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (527)

View on Semantic Scholar

Summary

The paper introduces DFANet, a lightweight CNN that uses innovative deep feature aggregation to enable real-time semantic segmentation.
It reduces computational complexity by 8× FLOPs and doubles processing speed, achieving around 70–71% Mean IOU on high-resolution datasets.
By modifying the Xception network with depthwise separable convolutions and multi-scale feature integration, DFANet supports applications like autonomous driving.

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

The paper presents DFANet, a convolutional neural network (CNN) architecture designed for real-time semantic segmentation under resource constraints. The proposed method addresses the demand for efficient inference speed and high accuracy with high-resolution images, which is critical for applications like autonomous driving and robot sensing.

Core Contributions

DFANet introduces an efficient architecture leveraging a lightweight backbone and innovative feature aggregation methodologies. The core contributions of DFANet are as follows:

Substantial Reduction in Computational Complexity: DFANet utilizes 8× fewer FLOPs and is 2× faster than existing state-of-the-art real-time segmentation approaches. It accomplishes this while maintaining comparative accuracy levels, with 70.3% Mean Intersection over Union (Mean IOU) on the Cityscapes test dataset at only 1.7 GFLOPs.
Innovative Feature Aggregation: The network employs two novel feature aggregation strategies:
- Sub-network Aggregation: This method refines prediction results by reusing high-level features across different network components.
- Sub-stage Aggregation: By integrating features within corresponding stages across sub-networks, DFANet enhances feature representation, balancing high-level contextual understanding and low-level spatial detail retention.
Modification of Xception for Efficiency: DFANet modifies the Xception network, incorporating depthwise separable convolutions and a fully-connected attention module to enhance receptive fields with minimal additional computation.

The architecture of DFANet comprises a lightweight backbone and cascades of sub-networks and sub-stages, allowing effective feature aggregation to maximize the usage of multi-scale receptive fields.

Experimental Evaluation

Experiments on the Cityscapes and CamVid datasets illustrate DFANet's superior performance, particularly in scenarios that demand real-time processing:

Cityscapes Dataset: Achieving 71.3% Mean IOU with 3.4 GFLOPs and a speed of 100 FPS on a Titan X card, DFANet establishes a high standard for speed-accuracy trade-offs in real-time semantic segmentation.
CamVid Dataset: Conforming to high-resolution image processing requirements, DFANet exhibits significant speed advantages with only slight reductions in segmentation accuracy.

Implications and Future Work

DFANet's architecture paves the way for advanced real-time segmentation solutions by demonstrating that high-level feature aggregation and efficient network designs can coexist with resource constraints. The approach of integrating multi-stage and multi-network features could be expanded to other areas of computer vision and applied to more complex tasks requiring real-time processing.

In future developments, further exploration into adaptive feature aggregation techniques and experimentation with diverse backbone networks could extend DFANet's application breadth. Additionally, focusing on optimizing the network for various hardware architectures may open broader real-time deployment possibilities. The proposed DFANet establishes a framework that balances computational constraints with practical application needs, contributing significantly to the domain of efficient semantic segmentation.

Markdown Report Issue