PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image Segmentation (2401.07579v1)
Abstract: Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes. Applying these large-scale models to the relatively limited scale of medical image datasets tends to induce redundant computation, complicating the process without the necessary benefits. This approach not only adds complexity but also presents challenges for the integration and deployment of lightweight models on edge devices. For instance, recent transformer-based models have excelled in 2D and 3D medical image segmentation due to their extensive receptive fields and high parameter count. However, their effectiveness comes with a risk of overfitting when applied to small datasets and often neglects the vital inductive biases of Convolutional Neural Networks (CNNs), essential for local feature representation. In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical in larger models. PMFSNet streamlines the UNet-based hierarchical structure and simplifies the self-attention mechanism's computational complexity, making it suitable for lightweight applications. It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies. Extensive comprehensive results demonstrate that even with a model (less than 1 million parameters), our method achieves superior performance in various segmentation tasks across different data scales. It achieves (IoU) metrics of 84.68%, 82.02%, and 78.82% on public datasets of teeth CT (CBCT), ovarian tumors ultrasound(MMOTU), and skin lesions dermoscopy images (ISIC 2018), respectively. The source code is available at https://github.com/yykzjh/PMFSNet.
- Multiclass semantic segmentation and quantification of traumatic brain injury lesions on head ct using deep learning: an algorithm development and multicentre validation study, The Lancet Digital Health 2 (2020) e314–e322.
- Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
- Pyramid scene parsing network, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890.
- U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 2015, pp. 234–241.
- Unet++: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, Springer, 2018, pp. 3–11.
- Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation, arXiv preprint arXiv:1802.06955 (2018).
- Attention u-net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999 (2018).
- 3d u-net: learning dense volumetric segmentation from sparse annotation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, Springer, 2016, pp. 424–432.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 fourth international conference on 3D vision (3DV), Ieee, 2016, pp. 565–571.
- Local relation networks for image recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3464–3473.
- Stand-alone self-attention in vision models, Advances in neural information processing systems 32 (2019).
- Encoder-decoder with atrous separable convolution for semantic image segmentation, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818.
- Ce-net: Context encoder network for 2d medical image segmentation, IEEE transactions on medical imaging 38 (2019) 2281–2292.
- An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
- Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306 (2021).
- Cfatransunet: Channel-wise cross fusion attention and transformer for 2d medical image segmentation, Computers in Biology and Medicine 168 (2024) 107803.
- Medical transformer: Gated axial-attention for medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, Springer, 2021, pp. 36–46.
- Transbts: Multimodal brain tumor segmentation using transformer, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, Springer, 2021, pp. 109–119.
- Unetr: Transformers for 3d medical image segmentation, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574–584.
- Swin-unet: Unet-like pure transformer for medical image segmentation, in: European conference on computer vision, Springer, 2022, pp. 205–218.
- Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, in: International MICCAI Brainlesion Workshop, Springer, 2021, pp. 272–284.
- nnformer: volumetric medical image segmentation via a 3d transformer, IEEE Transactions on Image Processing (2023).
- Deep convolutional neural network for automatically segmenting acute ischemic stroke lesion in multi-modality mri, Neural Computing and Applications 32 (2020) 6545–6558.
- nnu-net: a self-configuring method for deep learning-based biomedical image segmentation, Nature methods 18 (2021) 203–211.
- Inception-v4, inception-resnet and the impact of residual connections on learning, in: Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- F. Chollet, Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
- Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
- F. Yu, V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint arXiv:1511.07122 (2015).
- Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size, arXiv preprint arXiv:1602.07360 (2016).
- Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.
- Shufflenet: An extremely efficient convolutional neural network for mobile devices, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848–6856.
- Feature fusion encoder decoder network for automatic liver lesion segmentation, in: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), IEEE, 2019, pp. 430–433.
- Focusnet: An attention-based fully convolutional network for medical image segmentation, in: 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), IEEE, 2019, pp. 455–458.
- Non-local u-nets for biomedical image segmentation, in: Proceedings of the AAAI conference on artificial intelligence, volume 34, 2020, pp. 6315–6322.
- Swinbts: A method for 3d multimodal brain tumor segmentation using swin transformer, Brain sciences 12 (2022) 797.
- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, Springer, 2021, pp. 171–180.
- Transformers in medical imaging: A survey, Medical Image Analysis (2023) 102802.
- Leanet: Lightweight u-shaped architecture for high-performance skin cancer image segmentation, Computers in Biology and Medicine (2024) 107919.
- Lm-net: A light-weight and multi-scale network for medical image segmentation, Computers in Biology and Medicine 168 (2024) 107717.
- Polarized self-attention: Towards high-quality pixel-wise regression, arXiv preprint arXiv:2107.00782 (2021).
- A fully automatic ai system for tooth and alveolar bone segmentation from cone-beam ct images, Nature communications 13 (2022) 2096.
- A multi-modality ovarian tumor ultrasound image dataset for unsupervised cross-domain semantic segmentation, arXiv preprint arXiv:2207.06799 (2022).
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic), arXiv preprint arXiv:1902.03368 (2019).
- The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, Scientific data 5 (2018) 1–9.
- T. DeVries, G. W. Taylor, Improved regularization of convolutional neural networks with cutout, arXiv preprint arXiv:1708.04552 (2017).
- Imagenet large scale visual recognition challenge, International journal of computer vision 115 (2015) 211–252.
- Automatic multi-organ segmentation on abdominal ct with dense v-networks, IEEE transactions on medical imaging 37 (2018) 1822–1834.
- Automatic 3d cardiovascular mr segmentation with densely-connected volumetric convnets, in: Medical Image Computing and Computer-Assisted Intervention- MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part II 20, Springer, 2017, pp. 287–295.
- N. Ibtehaz, M. S. Rahman, Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation, Neural networks 121 (2020) 74–87.
- 3d ux-net: A large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation, arXiv preprint arXiv:2209.15076 (2022).
- Dual attention network for scene segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146–3154.
- Segformer: Simple and efficient design for semantic segmentation with transformers, Advances in Neural Information Processing Systems 34 (2021) 12077–12090.
- Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation, International Journal of Computer Vision 129 (2021) 3051–3068.
- Ca-net: Comprehensive attention convolutional neural networks for explainable medical image segmentation, IEEE transactions on medical imaging 40 (2020) 699–711.
- Bi-directional convlstm u-net with densley connected convolutions, in: Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019, pp. 0–0.
- Cpfnet: Context pyramid fusion network for medical image segmentation, IEEE transactions on medical imaging 39 (2020) 3008–3018.
- Cascade knowledge diffusion network for skin lesion diagnosis and segmentation, Applied soft computing 99 (2021) 106881.
- Contnet: Why not use convolution and transformer at the same time?, arXiv preprint arXiv:2104.13497 (2021).
- Convnets match vision transformers at scale, arXiv preprint arXiv:2310.16764 (2023).
- A. Trockman, J. Z. Kolter, Patches are all you need?, arXiv preprint arXiv:2201.09792 (2022).
- Jiahui Zhong (1 paper)
- Wenhong Tian (24 papers)
- Yuanlun Xie (4 papers)
- Zhijia Liu (2 papers)
- Jie Ou (13 papers)
- Taoran Tian (2 papers)
- Lei Zhang (1689 papers)