Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation (2107.12461v1)

Published 26 Jul 2021 in eess.IV and cs.CV

Abstract: The U-Net architecture, built upon the fully convolutional network, has proven to be effective in biomedical image segmentation. However, U-Net applies skip connections to merge semantically different low- and high-level convolutional features, resulting in not only blurred feature maps, but also over- and under-segmented target regions. To address these limitations, we propose a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, called Sharp U-Net, for binary and multi-class biomedical image segmentation. The key rationale of Sharp U-Net is that instead of applying a plain skip connection, a depthwise convolution of the encoder feature map with a sharpening kernel filter is employed prior to merging the encoder and decoder features, thereby producing a sharpened intermediate feature map of the same size as the encoder map. Using this sharpening filter layer, we are able to not only fuse semantically less dissimilar features, but also to smooth out artifacts throughout the network layers during the early stages of training. Our extensive experiments on six datasets show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks, while adding no extra learnable parameters. Furthermore, Sharp U-Net outperforms baselines that have more than three times the number of learnable parameters.

Citations (185)

View on Semantic Scholar

Summary

The paper introduces Sharp U-Net, which mitigates feature fusion issues in traditional U-Net by using depthwise convolution with a sharpening filter.
Benchmarking on multiple biomedical datasets shows that Sharp U-Net outperforms or matches state-of-the-art models in challenging segmentation tasks.
The approach refines spatial feature alignment without additional parameters, offering broad applicability in diverse encoder-decoder architectures.

An Evaluation of Sharp U-Net for Biomedical Image Segmentation

The paper entitled "Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation" presents a novel approach to improving the U-Net architecture commonly used for biomedical image segmentation tasks. The proposed Sharp U-Net addresses specific shortcomings in the classical U-Net by leveraging depthwise convolution accompanied by a sharpening filter, aiming to mitigate the fusion of semantically different features which can occur due to skip connections.

Overview of Sharp U-Net

The Sharp U-Net introduces an innovative architectural enhancement to the classical U-Net framework. Instead of utilizing standard skip connections for feature merging between encoder and decoder, Sharp U-Net uses a sharpening filter layer. This layer conducts depthwise convolution on the encoder feature maps. Such convolution is performed independently on each channel, using a spatial filter for sharpening, intended to enhance spatial feature alignment and reduce semantic dissimilarity between encoder and decoder features. The framework seeks to smooth artifacts and refine image details early in the training process.

Performance Improvements

Extensive experimentation on various biomedical datasets—namely, Lung Segmentation, Data Science Bowl 2018, ISIC-2018, COVID-19 CT Segmentation, ISBI-2012, and CVC-ClinicDB—demonstrate that Sharp U-Net consistently either outperforms or matches existing state-of-the-art models in both binary and multi-class image segmentation tasks. Notably, Sharp U-Net achieves marked improvements over baselines like U-Net, Wide U-Net, and U-Net with pre-trained encoders such as ResNet-50 and VGG. The paper reports significant enhancements on datasets with highly challenging images, characterized by homogeneous regions between foregrounds and backgrounds, underlining the robustness of Sharp U-Net in overcoming segmentation challenges like over- and under-segmentation.

Implications and Future Directions

Sharp U-Net not only enhances the segmentation precision without introducing additional learnable parameters but also finds applicability in improving many types of encoder-decoder architectures. The implementation of sharpening filters may easily be generalizable across different network configurations, potentially broadening its utility in diverse tasks within medical imaging. The insights gained from improving spatial feature alignment can be influential in advancing methodologies for precise segmentation in multimodal imaging environments.

The paper also hints at potential future directions including exploring deeper neural architectures for volumetric medical image analytics and further refinement of techniques aimed at seamlessly bridging semantic gaps between multi-layer feature extractions. Given the robustness demonstrated by Sharp U-Net, there is substantial room for exploring multi-task learning opportunities that could leverage the architectural enhancements proposed in this work.

Conclusion

The contributions of the Sharp U-Net, in terms of architectural innovation and segmentation accuracy, offer a compelling advancement for biomedical image analysis. Through its integration of sharpening spatial filters, Sharp U-Net presents a sophisticated strategy for achieving refined image segmentation, promising improved diagnostic capabilities in clinical practice and research environments.