MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology (2403.06800v1)
Abstract: Multiple Instance Learning (MIL) has emerged as a dominant paradigm to extract discriminative feature representations within Whole Slide Images (WSIs) in computational pathology. Despite driving notable progress, existing MIL approaches suffer from limitations in facilitating comprehensive and efficient interactions among instances, as well as challenges related to time-consuming computations and overfitting. In this paper, we incorporate the Selective Scan Space State Sequential Model (Mamba) in Multiple Instance Learning (MIL) for long sequence modeling with linear complexity, termed as MambaMIL. By inheriting the capability of vanilla Mamba, MambaMIL demonstrates the ability to comprehensively understand and perceive long sequences of instances. Furthermore, we propose the Sequence Reordering Mamba (SR-Mamba) aware of the order and distribution of instances, which exploits the inherent valuable information embedded within the long sequences. With the SR-Mamba as the core component, MambaMIL can effectively capture more discriminative features and mitigate the challenges associated with overfitting and high computational overhead. Extensive experiments on two public challenging tasks across nine diverse datasets demonstrate that our proposed framework performs favorably against state-of-the-art MIL methods. The code is released at https://github.com/isyangshu/MambaMIL.
- Jaume Amores. Multiple instance classification: Review, taxonomy and comparative study. Artificial intelligence, 201:81–105, 2013.
- Multi-instance multi-label image classification: A neural approach. Neurocomputing, 99:298–306, 2013.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Transpath: Transformer-based self-supervised learning for histopathological image classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 186–195. Springer, 2021.
- A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine, 29(9):2307–2316, 2023.
- Attention-based deep multiple instance learning. In International Conference on Machine Learning, pages 2127–2136. PMLR, 2018.
- Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering, 5(6):555–570, 2021.
- Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 14318–14328, 2021.
- Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 18802–18812, 2022.
- Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Advances in Neural Information Processing Systems, 34:2136–2147, 2021.
- Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In Proceedings of the IEEE International Conference on Computer Vision, pages 4015–4025, 2021.
- Dt-mil: deformable transformer for multi-instance learning on histopathological image. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 206–216. Springer, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Structured state space models for multiple instance learning in digital pathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 594–604. Springer, 2023.
- Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. 1960.
- Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. Ieee, 2009.
- Bracs: A dataset for breast carcinoma subtyping in h&e histology images. Database, 2022:baac093, 2022.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.