Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration (2205.11876v1)

Published 24 May 2022 in cs.CV

Abstract: Recent learning-based image fusion methods have marked numerous progress in pre-registered multi-modality data, but suffered serious ghosts dealing with misaligned multi-modality data, due to the spatial deformation and the difficulty narrowing cross-modality discrepancy. To overcome the obstacles, in this paper, we present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion (IVIF). Specifically, we propose a Cross-modality Perceptual Style Transfer Network (CPSTN) to generate a pseudo infrared image taking a visible image as input. Benefiting from the favorable geometry preservation ability of the CPSTN, the generated pseudo infrared image embraces a sharp structure, which is more conducive to transforming cross-modality image alignment into mono-modality registration coupled with the structure-sensitive of the infrared image. In this case, we introduce a Multi-level Refinement Registration Network (MRRN) to predict the displacement vector field between distorted and pseudo infrared images and reconstruct registered infrared image under the mono-modality setting. Moreover, to better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM) to adaptively select more meaningful features for fusion in the Dual-path Interaction Fusion Network (DIFN). Extensive experimental results suggest that the proposed method performs superior capability on misaligned cross-modality image fusion.

Citations (139)

View on Semantic Scholar

Summary

The paper introduces a novel unsupervised framework that fuses misaligned infrared and visible images via a three-stage process involving cross-modality style transfer, image registration, and dual-path fusion.
It employs the Cross-modality Perceptual Style Transfer Network to generate pseudo-infrared images that retain sharp geometric structures, facilitating effective image registration.
Experimental results on TNO and RoadScene datasets show significant improvements in MI and SSIM, underscoring the framework’s potential in autonomous driving and surveillance.

Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration

The paper "Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration" introduces a novel framework designed to address the challenges of fusing misaligned infrared (IR) and visible images. Traditional methods that operate on pre-registered images often falter when faced with misalignments due to discrepancies in intensity and spatial deformations. This work proposes a robust solution that combines cross-modality image generation with a registration paradigm, allowing for enhanced alignment and fusion of IR and visible images.

The core of the proposed method is a three-stage process: cross-modality image generation, mono-modality image registration, and interaction fusion. At the heart of the cross-modality generation stage is the Cross-modality Perceptual Style Transfer Network (CPSTN), which generates a pseudo-infrared image from the visible input. This transformation aids in reducing cross-modality discrepancies and ensures that the pseudo-infrared image contains sharper structural information. This characteristic is crucial as it eases the subsequent registration of infrared images.

For the registration phase, the Multi-level Refinement Registration Network (MRRN) is utilized. MRRN operates on both distorted and pseudo-infrared images to predict a deformation field, which helps rectify spatial misalignments through mono-modality registration. This step is pivotal as it corrects geometric distortions while preserving structural fidelity.

Finally, the fusion of registered infrared and visible images is achieved via the Dual-path Interaction Fusion Network (DIFN). A critical component of DIFN is the Interaction Fusion Module (IFM), which selects and combines significant features from each modality to produce a fused image enriched with textures and details. This approach mitigates the oversmoothing often observed with conventional fusion techniques.

The experimental evaluation conducted on TNO and RoadScene datasets demonstrates the efficacy of the proposed method, outperforming several state-of-the-art IVIF techniques. Notably, significant improvements were observed across metrics such as Mutual Information (MI) and Structural Similarity Index (SSIM), underscoring the method's capability to reduce ghosting artifacts and achieve precise image alignment.

Key contributions of the paper include:

A novel unsupervised framework capable of handling misaligned multispectral image pairs through a generation-registration paradigm.
Cross-modality Perceptual Style Transfer Network that emphasizes geometric structure retention, streamlining the fusion process.
Interaction Fusion Module that adaptively combines features, effectively preserving details and avoiding common artifacts associated with feature fusion.

The implications of this research are substantial, offering practical applications in fields like autonomous driving and surveillance, where the fusion of infrared and visible imagery is vital for enhanced visibility and situational awareness. The methodology not only bridges the gap in multi-modality image fusion under non-ideal pre-registration conditions but also inspires further exploration of unsupervised learning techniques in refining multispectral imagery.

Future research may focus on the adaptation and scaling of the proposed method to accommodate real-time applications and extended modalities, examining the influence of diverse environmental conditions on fusion quality. By continuing to refine these techniques, there is potential for profound impacts across various domains dependent on multispectral information fusion.

PDF Markdown

Related Papers

GitHub

GitHub - wdhudiekou/UMF-CMGR: [IJCAI2022 Oral] Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration (146 stars)