Spectral Compressive Imaging Reconstruction Using Convolution and Contextual Transformer (2201.05768v4)
Abstract: Spectral compressive imaging (SCI) is able to encode the high-dimensional hyperspectral image to a 2D measurement, and then uses algorithms to reconstruct the spatio-spectral data-cube. At present, the main bottleneck of SCI is the reconstruction algorithm, and the state-of-the-art (SOTA) reconstruction methods generally face the problem of long reconstruction time and/or poor detail recovery. In this paper, we propose a novel hybrid network module, namely CCoT (Convolution and Contextual Transformer) block, which can acquire the inductive bias ability of convolution and the powerful modeling ability of transformer simultaneously,and is conducive to improving the quality of reconstruction to restore fine details. We integrate the proposed CCoT block into deep unfolding framework based on the generalized alternating projection algorithm, and further propose the GAP-CCoT network. Through the experiments of extensive synthetic and real data, our proposed model achieves higher reconstruction quality ($>$2dB in PSNR on simulated benchmark datasets) and shorter running time than existing SOTA algorithms by a large margin. The code and models are publicly available at https://github.com/ucaswangls/GAP-CCoT.
- K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing, 54(11):4311–4322, 2006.
- On the use of deep learning for computational imaging. Optica, 6(8):921–943, 2019.
- A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Transactions on Image Processing, 12(16):2992–3004, 2007.
- Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and remote sensing magazine, 1(2):6–36, 2013.
- Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
- Memory-efficient network for large-scale video compressive sensing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16246–16255, 2021.
- BIRNAT: Bidirectional recurrent neural networks with adversarial training for video snapshot compressive imaging. In European Conference on Computer Vision, pages 258–275. Springer, 2020.
- High-quality hyperspectral reconstruction using a spectral prior. ACM Transactions on Graphics (TOG), 36(6):1–13, 2017.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- CSWin transformer: A general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Application of hyperspectral imaging in food safety inspection and control: a review. Critical reviews in food science and nutrition, 52(11):1039–1058, 2012.
- Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal of selected topics in signal processing, 1(4):586–597, 2007.
- Coded hyperspectral image reconstruction using deep external and internal learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Single-shot compressive spectral imaging with a dual-disperser architecture. Optics express, 15(21):14013–14027, 2007.
- Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on international conference on machine learning, pages 399–406, 2010.
- A survey on visual transformer. arXiv preprint arXiv:2012.12556, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Fastreid: A pytorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631, 2020.
- Deep unfolding: Model-based inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574, 2014.
- Video from a single coded exposure photograph using a learned over-complete dictionary. In 2011 International Conference on Computer Vision, pages 287–294, Nov 2011.
- Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
- Deep gaussian scale mixture prior for spectral compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16216–16225, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012.
- Deep plug-and-play prior for hyperspectral image restoration. Neurocomputing, 2022.
- Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
- Contextual transformer networks for visual recognition. arXiv preprint arXiv:2107.12292, 2021.
- SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833–1844, 2021.
- Generalized alternating projection for weighted-ℓ2,1subscriptℓ21\ell_{2,1}roman_ℓ start_POSTSUBSCRIPT 2 , 1 end_POSTSUBSCRIPT minimization with applications to model-based compressive sensing. SIAM Journal on Imaging Sciences, 7(2):797–823, 2014.
- Microsoft COCO: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
- Rank minimization for snapshot compressive imaging. IEEE transactions on pattern analysis and machine intelligence, 41(12):2990–3006, 2018.
- Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021.
- Coded aperture compressive temporal imaging. Optics express, 21(9):10526–10545, 2013.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- GAP-net for snapshot compressive imaging. arXiv preprint arXiv:2012.08364, 2020.
- End-to-end low cost compressive spectral imaging with spatial-spectral self-attention. In European Conference on Computer Vision, pages 187–204. Springer, 2020.
- Snapshot multispectral endomicroscopy. Optics Letters, 45(14):3897–3900, 2020.
- Self-supervised neural networks for spectral snapshot compressive imaging. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2622–2631, 2021.
- λ𝜆\lambdaitalic_λ-net: Reconstruct hyperspectral images from a snapshot measurement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4059–4069, 2019.
- Conformer: Local features coupling global representations for visual recognition. arXiv preprint arXiv:2105.03889, 2021.
- Deep learning for video compressive sensing. APL Photonics, 5(3):30801, 2020.
- P2c2: Programmable pixel compressive camera for high speed imaging. In CVPR 2011, pages 329–336, June 2011.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
- Robert Stone et al. Centertrack: An ip overlay network for tracking dos floods. In USENIX Security Symposium, volume 21, page 114, 2000.
- Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pages 843–852, 2017.
- Deep learning on image denoising: An overview. Neural Networks, 2020.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
- Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
- Single disperser design for coded aperture snapshot spectral imaging. Applied optics, 47(10):B44–B51, 2008.
- A new backbone for hyperspectral image reconstruction. arXiv preprint arXiv:2108.07739, 2021.
- Hyperspectral image reconstruction using a deep spatial-spectral prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8032–8041, 2019.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- MetaSCI: Scalable and adaptive reconstruction for video compressive sensing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2083–2092, 2021.
- Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
- Compressive sensing by learning a Gaussian mixture model from measurements. IEEE Transactions on Image Processing, 24(1):106–119, 2014.
- Deep admm-net for compressive sensing mri. In Proceedings of the 30th international conference on neural information processing systems, pages 10–18, 2016.
- Admm-csnet: A deep learning approach for image compressive sensing. IEEE transactions on pattern analysis and machine intelligence, 42(3):521–538, 2018.
- Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum. IEEE transactions on image processing, 19(9):2241–2253, 2010.
- Tokens-to-token ViT: Training vision transformers from scratch on imageNet. arXiv e-prints, pages arXiv–2101, 2021.
- Xin Yuan. Generalized alternating projection based total variation minimization for compressive sensing. In 2016 IEEE International Conference on Image Processing (ICIP), pages 2539–2543. IEEE, 2016.
- Snapshot compressive imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine, 38(2):65–88, 2021.
- Plug-and-play algorithms for large-scale snapshot compressive imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1447–1457, 2020.
- Compressive hyperspectral imaging with side information. IEEE Journal of selected topics in Signal Processing, 9(6):964–976, 2015.
- Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1828–1837, 2018.
- Deep plug-and-play priors for spectral snapshot compressive imaging. Photonics Research, 9(2):B18–B29, 2021.
- Scene parsing through ADE20K dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 633–641, 2017.
- Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127(3):302–321, 2019.
- Deformable DETR: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2020.