Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System (2403.16227v1)
Abstract: Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different tasks. Thus, they tend to typically bias toward to reconstructing raw pixels without considering the significance of semantic features. Therefore, we propose a novel prior semantic guided image fusion method based on the dual-modality strategy, improving the performance of IVF in ITS. Specifically, to explore the independent significant semantic of each modality, we first design two parallel semantic segmentation branches with a refined feature adaptive-modulation (RFaM) mechanism. RFaM can perceive the features that are semantically distinct enough in each semantic segmentation branch. Then, two pilot experiments based on the two branches are conducted to capture the significant prior semantic of two images, which then is applied to guide the fusion task in the integration of semantic segmentation branches and fusion branches. In addition, to aggregate both high-level semantics and impressive visual effects, we further investigate the frequency response of the prior semantics, and propose a multi-level representation-adaptive fusion (MRaF) module to explicitly integrate the low-frequent prior semantic with the high-frequent details. Extensive experiments on two public datasets demonstrate the superiority of our method over the state-of-the-art image fusion approaches, in terms of either the visual appeal or the high-level semantics.
- X. Zhang and Y. Demiris, “Visible and infrared image fusion using deep learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “Lrrnet: A novel representation learning guided fusion network for infrared and visible images,” IEEE transactions on pattern analysis and machine intelligence, 2023.
- S. Karim, G. Tong, J. Li, A. Qadir, U. Farooq, and Y. Yu, “Current advances and future perspectives of image fusion: A comprehensive review,” Information Fusion, 2022.
- J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Information Fusion, vol. 45, pp. 153–178, 2019.
- J. Pei, T. Jiang, H. Tang, N. Liu, Y. Jin, D.-P. Fan, and P.-A. Heng, “Calibnet: Dual-branch cross-modal calibration for rgb-d salient instance segmentation,” arXiv preprint arXiv:2307.08098, 2023.
- M. Ju, C. He, J. Liu, B. Kang, J. Su, and D. Zhang, “Ivf-net: An infrared and visible data fusion deep network for traffic object enhancement in intelligent transportation systems,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 1, pp. 1220–1234, 2022.
- G. Li, X. Qian, and X. Qu, “Sosmaskfuse: An infrared and visible image fusion architecture based on salient object segmentation mask,” IEEE Transactions on Intelligent Transportation Systems, 2023.
- Q. Zhang, S. Zhao, Y. Luo, D. Zhang, N. Huang, and J. Han, “Abmdrnet: Adaptive-weighted bi-directional modality difference reduction network for rgb-t semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2633–2642.
- T. Xue, Z. Zhang, W. Ma, Y. Li, A. Yang, and T. Ji, “Nighttime pedestrian and vehicle detection based on a fast saliency and multifeature fusion algorithm for infrared images,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 16 741–16 751, 2022.
- L. Tang, Y. Deng, Y. Ma, J. Huang, and J. Ma, “Superfusion: A versatile image registration and fusion network with semantic awareness,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 12, pp. 2121–2137, 2022.
- W. Zhao, S. Xie, F. Zhao, Y. He, and H. Lu, “Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 955–13 965.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.
- Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao, “Vision transformer adapter for dense predictions,” Proceedings of the IEEE/CVF International Conference on Learning Representations, 2023.
- W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li et al., “Internimage: Exploring large-scale vision foundation models with deformable convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 408–14 419.
- B. Shreyamsha Kumar, “Image fusion based on pixel significance using cross bilateral filter,” Signal, image and video processing, vol. 9, pp. 1193–1204, 2015.
- P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact image code,” in Readings in computer vision. Elsevier, 1987, pp. 671–679.
- Q. Zhang, Y. Liu, R. S. Blum, J. Han, and D. Tao, “Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review,” Information Fusion, vol. 40, pp. 57–75, 2018.
- D. P. Bavirisetti and R. Dhuli, “Two-scale image fusion of visible and infrared images using saliency detection,” Infrared Physics & Technology, vol. 76, pp. 52–64, 2016.
- D. Wang, J. Liu, R. Liu, and X. Fan, “An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection,” Information Fusion, vol. 98, p. 101828, 2023.
- N. Mitianoudis and T. Stathaki, “Pixel-based and region-based image fusion schemes using ica bases,” Information Fusion, vol. 8, no. 2, pp. 131–142, 2007.
- Y. Liu, S. Liu, and Z. Wang, “A general framework for image fusion based on multi-scale transform and sparse representation,” Information Fusion, vol. 24, pp. 147–164, 2015.
- H. Li, X. Wu, and J. Kittler, “Rfn-nest: An end-to-end residual fusion network for infrared and visible images,” Information Fusion, vol. 73, pp. 72–86, 2021.
- Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “Ifcnn: A general image fusion framework based on convolutional neural network,” Information Fusion, vol. 54, pp. 99–118, 2020.
- J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: A generative adversarial network for infrared and visible image fusion,” Information Fusion, vol. 48, pp. 11–26, 2019.
- J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, “Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks,” IEEE Transactions on Multimedia, vol. 23, pp. 1383–1396, 2020.
- J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion,” IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020.
- J. Li, H. Huo, K. Liu, and C. Li, “Infrared and visible image fusion using dual discriminators generative adversarial networks with wasserstein distance,” Information Sciences, 2020.
- J. Li, H. Huo, C. Li, R. Wang, C. Sui, and Z. Liu, “Multigrained attention network for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–12, 2020.
- V. VS, J. M. J. Valanarasu, P. Oza, and V. M. Patel, “Image fusion transformer,” 2021.
- J. Li, J. Zhu, C. Li, X. Chen, and B. Yang, “Cgtf: Convolution-guided transformer for infrared and visible image fusion,” IEEE Transactions on Instrumentation and Measurement, 2022.
- J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200–1217, 2022.
- Z. Wang, Y. Chen, W. Shao, H. Li, and L. Zhang, “Swinfuse: A residual swin transformer fusion network for infrared and visible images,” arXiv preprint arXiv:2204.11436, 2022.
- J. Li, B. Yang, L. Bai, H. Dou, C. Li, and L. Ma, “Tfiv: Multi-grained token fusion for infrared and visible image via transformer,” IEEE Transactions on Instrumentation and Measurement, 2023.
- Z. Zhao, H. Bai, Y. Zhu, J. Zhang, S. Xu, Y. Zhang, K. Zhang, D. Meng, R. Timofte, and L. Van Gool, “Ddfm: denoising diffusion model for multi-modality image fusion,” arXiv preprint arXiv:2303.06840, 2023.
- Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5906–5916.
- L. Tang, H. Zhang, H. Xu, and J. Ma, “Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity,” Information Fusion, p. 101870, 2023.
- J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 8115–8124.
- J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5802–5811.
- L. Tang, J. Yuan, and J. Ma, “Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network,” Information Fusion, vol. 82, pp. 28–42, 2022.
- H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2fusion: A unified unsupervised image fusion network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2020.
- H. Li, Y. Cen, Y. Liu, X. Chen, and Z. Yu, “Different input resolutions and arbitrary output resolution: a meta learning-based deep framework for infrared and visible image fusion,” IEEE Transactions on Image Processing, vol. 30, pp. 4070–4083, 2021.
- B. Zhao, R. Song, and J. Liang, “Cumulative spatial knowledge distillation for vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6146–6155.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.
- N. Park and S. Kim, “How do vision transformers work?” arXiv preprint arXiv:2202.06709, 2022.
- A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 761–769.
- G. Qu, D. Zhang, and P. Yan, “Information measure for performance of image fusion,” Electronics letters, vol. 38, no. 7, p. 1, 2002.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- V. Aslantas and E. Bendes, “A new image quality metric for image fusion: The sum of the correlations of differences,” Aeu-international Journal of electronics and communications, vol. 69, no. 12, pp. 1890–1896, 2015.
- Jing Li (621 papers)
- Lu Bai (50 papers)
- Bin Yang (320 papers)
- Chang Li (60 papers)
- Lingfei Ma (7 papers)
- Lixin Cui (24 papers)
- Edwin R. Hancock (25 papers)