- The paper introduces a novel unsupervised framework that fuses misaligned infrared and visible images via a three-stage process involving cross-modality style transfer, image registration, and dual-path fusion.
- It employs the Cross-modality Perceptual Style Transfer Network to generate pseudo-infrared images that retain sharp geometric structures, facilitating effective image registration.
- Experimental results on TNO and RoadScene datasets show significant improvements in MI and SSIM, underscoring the frameworkâs potential in autonomous driving and surveillance.
Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration
The paper "Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration" introduces a novel framework designed to address the challenges of fusing misaligned infrared (IR) and visible images. Traditional methods that operate on pre-registered images often falter when faced with misalignments due to discrepancies in intensity and spatial deformations. This work proposes a robust solution that combines cross-modality image generation with a registration paradigm, allowing for enhanced alignment and fusion of IR and visible images.
The core of the proposed method is a three-stage process: cross-modality image generation, mono-modality image registration, and interaction fusion. At the heart of the cross-modality generation stage is the Cross-modality Perceptual Style Transfer Network (CPSTN), which generates a pseudo-infrared image from the visible input. This transformation aids in reducing cross-modality discrepancies and ensures that the pseudo-infrared image contains sharper structural information. This characteristic is crucial as it eases the subsequent registration of infrared images.
For the registration phase, the Multi-level Refinement Registration Network (MRRN) is utilized. MRRN operates on both distorted and pseudo-infrared images to predict a deformation field, which helps rectify spatial misalignments through mono-modality registration. This step is pivotal as it corrects geometric distortions while preserving structural fidelity.
Finally, the fusion of registered infrared and visible images is achieved via the Dual-path Interaction Fusion Network (DIFN). A critical component of DIFN is the Interaction Fusion Module (IFM), which selects and combines significant features from each modality to produce a fused image enriched with textures and details. This approach mitigates the oversmoothing often observed with conventional fusion techniques.
The experimental evaluation conducted on TNO and RoadScene datasets demonstrates the efficacy of the proposed method, outperforming several state-of-the-art IVIF techniques. Notably, significant improvements were observed across metrics such as Mutual Information (MI) and Structural Similarity Index (SSIM), underscoring the method's capability to reduce ghosting artifacts and achieve precise image alignment.
Key contributions of the paper include:
- A novel unsupervised framework capable of handling misaligned multispectral image pairs through a generation-registration paradigm.
- Cross-modality Perceptual Style Transfer Network that emphasizes geometric structure retention, streamlining the fusion process.
- Interaction Fusion Module that adaptively combines features, effectively preserving details and avoiding common artifacts associated with feature fusion.
The implications of this research are substantial, offering practical applications in fields like autonomous driving and surveillance, where the fusion of infrared and visible imagery is vital for enhanced visibility and situational awareness. The methodology not only bridges the gap in multi-modality image fusion under non-ideal pre-registration conditions but also inspires further exploration of unsupervised learning techniques in refining multispectral imagery.
Future research may focus on the adaptation and scaling of the proposed method to accommodate real-time applications and extended modalities, examining the influence of diverse environmental conditions on fusion quality. By continuing to refine these techniques, there is potential for profound impacts across various domains dependent on multispectral information fusion.