Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks (1808.08127v1)

Published 23 Aug 2018 in cs.CV

Abstract: In a wide range of semantic segmentation tasks, fully convolutional neural networks (F-CNNs) have been successfully leveraged to achieve state-of-the-art performance. Architectural innovations of F-CNNs have mainly been on improving spatial encoding or network connectivity to aid gradient flow. In this article, we aim towards an alternate direction of recalibrating the learned feature maps adaptively; boosting meaningful features while suppressing weak ones. The recalibration is achieved by simple computational blocks that can be easily integrated in F-CNNs architectures. We draw our inspiration from the recently proposed 'squeeze & excitation' (SE) modules for channel recalibration for image classification. Towards this end, we introduce three variants of SE modules for segmentation, (i) squeezing spatially and exciting channel-wise, (ii) squeezing channel-wise and exciting spatially and (iii) joint spatial and channel 'squeeze & excitation'. We effectively incorporate the proposed SE blocks in three state-of-the-art F-CNNs and demonstrate a consistent improvement of segmentation accuracy on three challenging benchmark datasets. Importantly, SE blocks only lead to a minimal increase in model complexity of about 1.5%, while the Dice score increases by 4-9% in the case of U-Net. Hence, we believe that SE blocks can be an integral part of future F-CNN architectures.

Authors (3)

Abhijit Guha Roy (28 papers)
Nassir Navab (461 papers)
Christian Wachinger (64 papers)

Citations (347)

View on Semantic Scholar

Summary

The paper introduces novel SE blocks that recalibrate spatial and channel features to significantly improve segmentation performance.
Methodologically, it integrates cSE, sSE, and scSE modules into existing F-CNNs, achieving 4–9% Dice score improvements with minimal complexity increase.
Experimental validation on brain MRI, CT, and retinal OCT datasets highlights the practical benefits of SE blocks in challenging clinical imaging tasks.

Evaluating the Efficacy of Spatial and Channel Squeeze-Excitation Blocks in Fully Convolutional Networks for Image Segmentation

The paper introduces a novel approach to enhancing the performance of Fully Convolutional Neural Networks (F-CNNs) used in semantic segmentation by incorporating specialized computational units termed "Squeeze-Excitation" (SE) blocks. These blocks recalibrate feature maps to optimize the meaningful aspects of the data while suppressing less significant features. This recalibration is executed through spatial and channel-wise manipulation within the network, aligned with innovations from recent developments in image classification.

Methodological Innovations

The authors propose three variants of SE blocks, demonstrating their integration into existing F-CNN architectures:

Channel Squeeze-Excitation (cSE) Block: Inspired by SE modules designed for image classification, cSE focuses on recalibrating channel information by incorporating global spatial information via global average pooling to facilitate channel-wise excitation.
Spatial Squeeze-Excitation (sSE) Block: This newly introduced block aims at exploiting spatial information, beneficial for the fine-grained segmentation requirements typical in medical imaging. It achieves spatial focus by squeezing along channels and exciting spatial locations.
Spatial and Channel Squeeze-Excitation (scSE) Block: A synthesis of the previous two, combining channel and spatial recalibration to exploit the unique benefits of both components.

The integration of these SE blocks in three state-of-the-art F-CNN architectures—U-Net, SD-Net, and FC-DenseNet—demonstrates consistent performance improvements across multiple challenging segmentation datasets. Most notably, the scSE blocks register a Dice score improvement of 4-9% in U-Net while marginally increasing the model complexity by about 1.5%.

Experimental Validation and Results

The efficacy of the proposed SE blocks is evaluated across three diverse medical imaging segmentation tasks—brain MRI segmentation, whole-body CT segmentation, and retinal OCT segmentation. The inclusion of SE blocks enhances segmentation accuracy consistently, validated through Dice score comparisons. Specifically, the paper shows that:

In Brain MRI Segmentation: The proposed scSE blocks improve segmentation quality prominently, especially for smaller brain structures that were problematic in baseline models. The use of SE blocks in F-CNNs led to improved performance metrics in tasks characterized by small and irregularly shaped anatomical structures.
In Whole-Body CT Segmentation: Despite higher baseline scores, the addition of SE blocks brought further improvements, pointing to the robustness of the SE methodology even when tackling complex organ delineation tasks.
In Retinal OCT Segmentation: The scSE block significantly boosted performance in identifying fine structures, such as fluid pockets, suggesting its superiority in segmenting tiny, indistinct features.

Implications and Future Directions

The findings presented in this work underscore the potential of SE blocks to become integral components of F-CNN architectures, offering a technique to enhance segmentation accuracy without a substantial addition to computational costs. This makes the SE framework particularly attractive for medical imaging domains, where precision and reliability are critical. The framework's adaptability to existing network structures hints at its applicability to a broader range of computer vision tasks beyond medical imaging, potentially extending to areas with similar segmentation challenges, such as autonomous driving or geological analysis.

This exploration provides a foundation for future research focused on refining recalibration mechanisms in neural networks, emphasizing the interplay between spatial and channel information. Moreover, advancing the understanding of SE dynamics during network training presents an opportunity to inform the development of more sophisticated recalibration methods, potentially tailored for specific tasks or datasets. As neural network applications diversify, such modular enhancements could play a pivotal role in achieving superior model performances across various domains.

PDF Markdown