Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Cross-Attention for Infrared and Visible Image Fusion (2401.11675v1)

Published 22 Jan 2024 in eess.IV

Abstract: The salient information of an infrared image and the abundant texture of a visible image can be fused to obtain a comprehensive image. As can be known, the current fusion methods based on Transformer techniques for infrared and visible (IV) images have exhibited promising performance. However, the attention mechanism of the previous Transformer-based methods was prone to extract common information from source images without considering the discrepancy information, which limited fusion performance. In this paper, by reevaluating the cross-attention mechanism, we propose an alternate Transformer fusion network (ATFuse) to fuse IV images. Our ATFuse consists of one discrepancy information injection module (DIIM) and two alternate common information injection modules (ACIIM). The DIIM is designed by modifying the vanilla cross-attention mechanism, which can promote the extraction of the discrepancy information of the source images. Meanwhile, the ACIIM is devised by alternately using the vanilla cross-attention mechanism, which can fully mine common information and integrate long dependencies. Moreover, the successful training of ATFuse is facilitated by a proposed segmented pixel loss function, which provides a good trade-off for texture detail and salient structure preservation. The qualitative and quantitative results on public datasets indicate our ATFFuse is effective and superior compared to other state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Feature level image fusion of optical imagery and synthetic aperture radar (sar) for invasive alien plant species detection and mapping. Remote Sensing Applications: Society and Environment, 10:198–208, 2018.
  2. Revisiting feature fusion for rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 31(5):1804–1818, 2020.
  3. Ef-net: A novel enhancement and fusion network for rgb-d saliency detection. Pattern Recognition, 112:107740, 2021.
  4. Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Transactions on Instrumentation and Measurement, 68(1):49–64, 2018.
  5. Multisensor information fusion for people tracking with a mobile robot: A particle filtering approach. IEEE transactions on Instrumentation and Measurement, 64(9):2427–2442, 2015.
  6. Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45:153–178, 2019.
  7. Deep learning for pixel-level image fusion: Recent advances and future prospects. Information Fusion, 42:158–173, 2018.
  8. Mdlatlrr: A novel decomposition method for infrared and visible image fusion. IEEE Transactions on Image Processing, 29:4733–4746, 2020.
  9. Image fusion based on non-negative matrix factorization and infrared feature extraction. In 2013 6th International Congress on Image and Signal Processing (CISP), volume 2, pages 1046–1050. IEEE. 2013.
  10. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Physics & Technology, 76:52–64, 2016.
  11. Infrared and visible image fusion based on deep decomposition network and saliency analysis. IEEE Transactions on Multimedia, 24:3314–3326, 2021.
  12. A multi-band image synchronous fusion method based on saliency. Infrared Physics & Technology, 127:104466, 2022.
  13. Infrared and visible image fusion via gradient transfer and total variation minimization. Information Fusion, 31:100–109, 2016.
  14. Particle swarm optimization based fusion of near infrared and visible images for improved face verification. Pattern Recognition, 44(2):401–411, 2011.
  15. Infrared and visible images fusion using visual saliency and optimized spiking cortical model in non-subsampled shearlet transform domain. Multimedia Tools and Applications, 78:28609–28632, 2019.
  16. Region-aware rgb and near-infrared image fusion. Pattern Recognition, page 109717, 2023.
  17. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
  18. Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73:72–86, 2021.
  19. Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement, 70:1–15, 2020.
  20. Infrared and visible image fusion via parallel scene and texture learning. Pattern Recognition, 132:108929, 2022.
  21. Radfnet: An infrared and visible image fusion framework based on distributed network. Frontiers in Plant Science, 13, 2022.
  22. Fusion of infrared and visible light images for object detection based on cnn. In 2021 10th International Conference on Internet Computing for Science and Engineering, pages 110–115. 2021.
  23. Mcnet: Multiscale visible image and infrared image fusion network. Signal Processing, 208:108996, 2023.
  24. U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020.
  25. Fusiongan: A generative adversarial network for infrared and visible image fusion. Information fusion, 48:11–26, 2019.
  26. A gan-based visible and infrared image fusion algorithm. In AOPC 2021: Infrared Device and Infrared Technology, volume 12061, pages 215–221. SPIE. 2021.
  27. Infrared and visible image fusion using dual discriminators generative adversarial networks with wasserstein distance. Information Sciences, 529:28–41, 2020.
  28. Lbp-began: A generative adversarial network architecture for infrared and visible image fusion. Infrared Physics & Technology, 104:103144, 2020.
  29. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  30. Research and implementation of transformer label detection algorithm based on machine vision. In IOP Conference Series: Materials Science and Engineering, volume 782, page 032032. IOP Publishing. 2020.
  31. On the effectiveness of vision transformers for zero-shot face anti-spoofing. In 2021 IEEE International Joint Conference on Biometrics (IJCB), pages 1–8. IEEE. 2021.
  32. When vision transformers outperform resnets without pre-training or strong data augmentations. arXiv preprint arXiv:2106.01548, 2021.
  33. Hmf-former: Spatio-spectral transformer for hyperspectral and multispectral image fusion. IEEE Geoscience and Remote Sensing Letters, 2022.
  34. Image fusion transformer. In 2022 IEEE International Conference on Image Processing (ICIP), pages 3566–3570. IEEE. 2022.
  35. Transformer-based end-to-end anatomical and functional image fusion. IEEE Transactions on Instrumentation and Measurement, 71:1–11, 2022.
  36. Multimodal image fusion based on hybrid cnn-transformer and non-local cross-modal attention. arXiv preprint arXiv:2210.09847, 2022.
  37. Datfuse: Infrared and visible image fusion via dual attention transformer. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
  38. Cafnet: Cross-attention fusion network for infrared and low illumination visible-light image. Neural Processing Letters, pages 1–15, 2022.
  39. Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7):1200–1217, 2022.
  40. Enhanced image restoration via supervised target feature transfer. In 2020 IEEE International Conference on Image Processing (ICIP), pages 1028–1032. IEEE. 2020.
  41. Deep multi-instance learning for end-to-end person re-identification. Multimedia Tools and Applications, 77:12437–12467, 2018.
  42. Video-based person re-identification via self-paced learning and deep reinforcement learning framework. In Proceedings of the 26th ACM international conference on Multimedia, pages 1562–1570. 2018.
  43. Yungang Zhang and Yu Xiang. Recent advances in deep learning for single image super-resolution. In Advances in Brain Inspired Cognitive Systems: 9th International Conference, BICS 2018, Xi’an, China, July 7-8, 2018, Proceedings, pages 85–95. Springer. 2018.
  44. Infrared and visible image fusion with convolutional neural networks. International Journal of Wavelets, Multiresolution and Information Processing, 16(03):1850018, 2018.
  45. Stdfusionnet: An infrared and visible image fusion network based on salient target detection. IEEE Transactions on Instrumentation and Measurement, 70:1–13, 2021.
  46. Coupled gan with relativistic discriminators for infrared and visible images fusion. IEEE Sensors Journal, 21(6):7458–7467, 2019.
  47. Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29:4980–4995, 2020.
  48. Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Transactions on Instrumentation and Measurement, 70:1–14, 2020.
  49. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  50. Focal detr: Target-aware token design for transformer-based object detection. Sensors, 22(22):8686, 2022.
  51. Infrared dim and small target detection based on u-transformer. Journal of Visual Communication and Image Representation, 89:103684, 2022.
  52. Oriented target detection algorithm based on transformer. In 2021 4th International Conference on Artificial Intelligence and Pattern Recognition, pages 22–28. 2021.
  53. Feature pre-inpainting enhanced transformer for video inpainting. Engineering Applications of Artificial Intelligence, 123:106323, 2023.
  54. Hongshan Gan and Yi Wan. A hybrid encoder transformer network for video inpainting. In 2022 2nd International Conference on Computer, Control and Robotics (ICCCR), pages 230–234. IEEE. 2022.
  55. Spatio-temporal inference transformer network for video inpainting. International Journal of Image and Graphics, page 2350007, 2021.
  56. Multi-scale deformable transformer for multi-contrast knee mri super-resolution. Biomedical Signal Processing and Control, 79:104154, 2023.
  57. Ctcnet: A cnn-transformer cooperation network for face image super-resolution. IEEE Transactions on Image Processing, 32:1978–1991, 2023.
  58. Fully cross-attention transformer for guided depth super-resolution. Sensors, 23(5):2723, 2023.
  59. Tccfusion: An infrared and visible image fusion method based on transformer and cross correlation. Pattern Recognition, 137:109295, 2023.
  60. Aft: Adaptive fusion transformer for visible and infrared images. IEEE Transactions on Image Processing, 32:2077–2092, 2023.
  61. Thfuse: An infrared and visible image fusion network using transformer and hybrid feature extractor. Neurocomputing, 2023.
  62. An end-to-end medical image fusion network based on swin-transformer. Microprocessors and Microsystems, 98:104781, 2023.
  63. Swinfuse: A residual swin transformer fusion network for infrared and visible images. IEEE Transactions on Instrumentation and Measurement, 71:1–12, 2022.
  64. Early convolutions help transformers see better. Advances in Neural Information Processing Systems, 34:30392–30400, 2021.
  65. Piafusion: A progressive infrared and visible image fusion network based on illumination aware. Information Fusion, 83:79–92, 2022.
  66. A general framework for image fusion based on multi-scale transform and sparse representation. Information fusion, 24:147–164, 2015.
  67. Yun-Jiang Rao. In-fibre bragg grating sensors. Measurement science and technology, 8(4):355, 1997.
  68. A new metric based on extended spatial frequency and its application to dwt based fusion algorithms. Information Fusion, 8(2):177–192, 2007.
  69. A new image fusion performance metric based on visual information fidelity. Information fusion, 14(2):127–135, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lihua Jian (3 papers)
  2. Songlei Xiong (1 paper)
  3. Han Yan (108 papers)
  4. Xiaoguang Niu (3 papers)
  5. Shaowu Wu (3 papers)
  6. Di Zhang (231 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.