UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation (1912.05074v2)

Published 11 Dec 2019 in eess.IV, cs.CV, and cs.LG

Abstract: The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects -- an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

Authors (4)

Zongwei Zhou (60 papers)
Md Mahfuzur Rahman Siddiquee (18 papers)
Nima Tajbakhsh (21 papers)
Jianming Liang (24 papers)

Citations (2,258)

View on Semantic Scholar

Summary

The paper introduces an ensemble of U-Nets with varied depths through deep supervision, boosting segmentation accuracy across multiple medical tasks.
It redesigns skip connections to fuse multi-scale semantic features effectively, significantly improving convergence and overall performance.
The architecture supports a pruning strategy that reduces inference time, making it suitable for deployment on resource-constrained devices.

Overview of "UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation"

The paper presents UNet++, a new neural architecture tailored to address the intrinsic limitations of existing encoder-decoder networks used for medical image segmentation. The proposed architecture aims to enhance segmentation accuracy by redesigning skip connections and embedding U-Nets of varying depths within a unified framework.

Key Contributions

Built-in Ensemble of U-Nets: UNet++ integrates U-Nets of different depths into a single architecture, allowing simultaneous training through deep supervision. This approach negates the necessity for exhaustive architecture searches to determine the optimal network depth and improves segmentation performance across objects of varying sizes.
Redesigned Skip Connections: The conventional skip connections in encoder-decoder architectures are modified to aggregate feature maps of varying semantic scales. This redesign permits more flexible feature fusion in the decoder sub-networks, enhancing segmentation performance and convergence speed.
Model Pruning: UNet++ supports a pruning scheme enabled by deep supervision, which allows the model to operate in various pruned configurations. This functionality significantly reduces inference time, trading off minimal performance degradation, thus making the model amenable to deployment on resource-constrained devices.

Experimental Validation

The authors evaluate UNet++ using six different medical image segmentation datasets encompassing computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM). The datasets cover various segmentation tasks ranging from brain tumor and liver segmentation to lung nodule and cell segmentation. The results demonstrate:

Consistent Performance Improvement:

UNet++ consistently outperforms baseline models across all datasets. For instance, in brain tumor segmentation, UNet++ achieves an IoU of 91.21% compared to U-Net's 89.21%.
Enhanced Scalability:

By integrating redesigned skip connections and deep supervision, UNet++ exhibits the ability to segment objects of varying sizes more effectively than its predecessors. This improvement is evidenced by the detailed analysis of brain tumor sizes, where UNet++ outperformed U-Net across all size buckets.
Instance Segmentation:

The paper also extends UNet++ to instance segmentation tasks through Mask RCNN++. The redesigned skip connections incorporated into Mask R-CNN’s feature pyramid enhance instance segmentation performance. For nuclei segmentation, Mask RCNN++ outperforms Mask R-CNN, showing a significant improvement in IoU from 93.28% to 95.10%.

Practical and Theoretical Implications

The practical implications of this research are manifold. The integration of U-Nets of varying depths within a single architecture improves segmentation performance without necessitating separate training and ensembling, therefore optimizing computational resources. The pruning strategy further enhances UNet++'s utility, especially in contexts requiring real-time processing on mobile devices or edge computing platforms.

From a theoretical perspective, the redesign of skip connections proposed in UNet++ offers a novel approach to feature aggregation in encoder-decoder networks. The dense connectivity along the redesigned skip connections facilitates a more effective transfer of semantic features across different scales. Additionally, the collaborative learning enabled by deep supervision across embedded U-Nets of various depths introduces a new paradigm in multi-task learning within a single architectural framework.

Future Directions

Several future research directions are suggested by the findings in this paper:

Extended Application Domains: While the current paper focusses on medical image segmentation, the principles underlying UNet++ could be extended to other domains requiring precise segmentation of diverse objects, such as satellite imagery and autonomous driving.
Automated Architecture Search: Building on the benefits of UNet++’s built-in ensemble, future work could incorporate automated architecture search techniques to further optimize the architecture for specific applications and datasets.
Advanced Backbone Networks: The extensibility of UNet++ to various modern backbone architectures (e.g., ResNet, DenseNet) suggests that future research could explore integrating more advanced and specialized backbone networks to further improve segmentation performance.
Self-supervised and Semi-supervised Learning: The deep supervision inherent in UNet++ offers an avenue for incorporating self-supervised and semi-supervised learning strategies, potentially enhancing the performance of the model on datasets with limited labeled examples.

In conclusion, UNet++ offers significant advancements in the field of image segmentation by introducing architectural innovations that optimize the depth and connectivity of encoder-decoder networks. The experimental results corroborate the efficacy of these innovations, marking a notable step forward in both theoretical and practical aspects of medical image segmentation.

PDF Markdown

Related Papers

GitHub

GitHub - MrGiovanni/UNetPlusPlus: [IEEE TMI] Official Implementation for UNet++ (2,201 stars)

Tweets

https://twitter.com/arXiv__ml/status/1205067552700940290

YouTube

Show All Videos