SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation
Abstract: The adoption of Vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex architectures with large scale computing resources for training and deployment. Furthermore, in the context of limited datasets, often encountered in medical imaging, larger models can present hurdles in both model generalization and convergence. In response to these challenges and to demonstrate that lightweight models are a valuable area of research in 3D medical imaging, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. Additionally, SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features to produce highly accurate segmentation masks. The proposed memory efficient Transformer preserves the performance characteristics of a significantly larger model in a compact design. SegFormer3D democratizes deep learning for 3D medical image segmentation by offering a model with 33x less parameters and a 13x reduction in GFLOPS compared to the current state-of-the-art (SOTA). We benchmark SegFormer3D against the current SOTA models on three widely used datasets Synapse, BRaTs, and ACDC, achieving competitive results. Code: https://github.com/OSUPCVLab/SegFormer3D.git
- Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, 37(11), 2018.
- Dense-unet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quantitative imaging in medicine and surgery, 10(6):1275, 2020.
- Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision. Springer, 2022.
- Transclaw u-net: Claw u-net with transformers for medical image segmentation. 2022 5th International Conference on Information Communication and Signal Processing (ICICSP), 2021.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv:2102.04306, 2021.
- 3d u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pages 424–432. Springer, 2016.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929, 2020.
- 3d deeply supervised network for automatic liver segmentation from ct volumes. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19. Springer, 2016.
- Automatic multi-organ segmentation on abdominal ct with dense v-networks. IEEE transactions on medical imaging, 37(8), 2018.
- Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop. Springer, 2021.
- Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022.
- Missformer: An effective transformer for 2d medical image segmentation. IEEE Transactions on Medical Imaging, 42(5), 2023.
- nnu-net: Self-adapting framework for u-net-based medical image segmentation, 2018.
- How much position information do convolutional neural networks encode? arXiv:2001.08248, 2020.
- Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, 2015.
- Pgd-unet: A position-guided deformable network for simultaneous segmentation of organs and tumors. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020.
- Efficientvit: Memory efficient vision transformer with cascaded group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14420–14430, 2023.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
- The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging, 34(10), 2014.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015.
- Hierarchical 3d fully convolutional networks for multi-organ segmentation. arXiv:1704.06382, 2017.
- Transbts: Multimodal brain tumor segmentation using transformer. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer, 2021a.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, 2021b.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing, 34, 2021a.
- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. ArXiv, abs/2103.03024, 2021b.
- Levit-unet: Make faster encoders with transformer for medical image segmentation. ArXiv, abs/2107.08623, 2021.
- Transfuse: Fusing transformers and cnns for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021. Springer, 2021.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- nnformer: Interleaved transformer for volumetric segmentation. arXiv:2109.03201, 2021.
- Deeply-supervised cnn for prostate segmentation. In 2017 international joint conference on neural networks (IJCNN). IEEE, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.