CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Published 26 Nov 2022 in cs.CV | (2211.14461v2)

Abstract: Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle the challenge in modeling cross-modality features and decomposing desirable modality-specific and modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse uses Restormer blocks to extract cross-modality shallow features. We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information. A correlation-driven loss is further proposed to make the low-frequency features correlated while the high-frequency features uncorrelated based on the embedded information. Then, the LT-based global fusion and INN-based local fusion layers output the fused image. Extensive experiments demonstrate that our CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion. We also show that CDDFuse can boost the performance in downstream infrared-visible semantic segmentation and object detection in a unified benchmark. The code is available at https://github.com/Zhaozixiang1228/MMIF-CDDFuse.

Abstract PDF Upgrade to Chat

Authors (8)

Citations (208)

View on Semantic Scholar

Summary

The paper presents a correlation-driven dual-branch network that fuses CNN and Transformer features to effectively extract both low-frequency shared and high-frequency specific details.
The paper introduces a novel decomposition loss that amplifies shared correlations while minimizing redundant high-frequency noise, achieving superior performance on infrared-visible and medical image fusion tasks.
The paper utilizes a two-stage training process that enhances robustness and demonstrates improved efficacy in downstream applications like object detection and semantic segmentation.

Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion: A Comprehensive Analysis

The paper introduces CDDFuse, a sophisticated framework aimed at enhancing multi-modality (MM) image fusion tasks. CDDFuse utilizes a Correlation-Driven Dual-Branch approach to address challenges in extracting and decomposing modality-specific and modality-shared features across various image modalities, such as infrared-visible and medical image fusion.

Key Contributions

The paper outlines several critical contributions to the MM image fusion domain:

Dual-Branch Transformer-CNN Architecture:
- CDDFuse integrates CNNs and Transformers to effectively capture and fuse low-frequency global and high-frequency local image features. The network employs Lite Transformer (LT) blocks for base feature extraction and Invertible Neural Networks (INN) to extract detail features without information loss.
Correlation-Driven Decomposition Loss:
- A novel loss function is introduced to enhance feature differentiation. It amplifies the correlation of modality-shared low-frequency features while minimizing the correlation of high-frequency details. This effectively delineates modality-specific characteristics from shared information.
Two-Stage Training Process:
- The proposed training regimen involves first training the model to reconstruct inputs and then refining the fusion process, significantly improving the robustness and efficacy of feature extraction and fusion.

Results and Evaluation

The efficacy of CDDFuse is demonstrated through extensive experiments across several datasets and image fusion tasks, including infrared-visible image fusion (IVF) and medical image fusion (MIF):

Infrared-Visible Fusion: On MSRS and RoadScene datasets, CDDFuse consistently achieves superior performance across metrics like entropy (EN), spatial frequency (SF), and structural similarity (SSIM), outperforming state-of-the-art methods such as DIDFuse and U2Fusion.
Medical Image Fusion: When applied to MRI-CT, MRI-PET, and MRI-SPECT datasets, CDDFuse maintains competitive performance, illustrating its versatility and generalization capability even without specific fine-tuning for medical images.
Downstream Tasks: The paper further validates CDDFuse's utility by demonstrating enhanced performance in downstream applications like infrared-visible object detection and semantic segmentation, suggesting broader applicability and impact.

Implications and Future Directions

The introduction of a correlation-based loss to emphasize modality-specific and shared features represents a significant advancement in understanding and optimizing feature extraction in MM image fusion. This insight offers a potential pathway for future work in developing more interpretable and efficient fusion models.

Furthermore, the integration of lightweight architectures like LT blocks highlights a growing trend towards achieving a balance between computational efficiency and model efficacy. The combination of CNN and Transformer methodologies within a unified framework also opens avenues for further exploration in hybrid network designs.

As AI continues to evolve, frameworks like CDDFuse will be essential in advancing the capacity for machines to integrate and interpret complex, multimodal input data, thus enhancing the capabilities of AI in fields such as medical imaging and autonomous systems. Future research may explore extending these approaches to real-time systems and further improving model interpretability and efficiency.

Markdown Report Issue