Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 168 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 79 tok/s Pro

Kimi K2 160 tok/s Pro

GPT OSS 120B 430 tok/s Pro

Claude Sonnet 4.5 33 tok/s Pro

2000 character limit reached

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion (2401.03788v2)

Published 8 Jan 2024 in cs.CV

Abstract: Low-light image enhancement techniques have significantly progressed, but unstable image quality recovery and unsatisfactory visual perception are still significant challenges. To solve these problems, we propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD. Specifically, CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process. Multi-scale supervision across different modalities facilitates the alignment of image features with semantic features during the wavelet diffusion process, effectively bridging the gap between degraded and normal domains. Moreover, to further promote the effective recovery of the image details, we combine the Fourier transform based on the wavelet transform and construct a Hybrid High Frequency Perception Module (HFPM) with a significant perception of the detailed features. This module avoids the diversity confusion of the wavelet diffusion process by guiding the fine-grained structure recovery of the enhancement results to achieve favourable metric and perceptually oriented enhancement. Extensive quantitative and qualitative experiments on publicly available real-world benchmarks show that our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression. The project code is available at https://github.com/hejh8/CFWD.

References (51)

Citations (9)

View on Semantic Scholar

Summary

The paper presents CFWD, a novel method that fuses wavelet diffusion with Fourier transforms guided by CLIP for effective low-light image enhancement.
It introduces a high-frequency perception module that leverages multi-modal semantic cues to improve both PSNR and structural details.
Experimental results demonstrate that CFWD outperforms state-of-the-art techniques, offering practical benefits for surveillance, photography, and autonomous systems.

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion

The paper "Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion" presents a novel approach to address prevalent challenges in low-light image enhancement. Despite significant advancements in the field, encountering poor image quality recovery and subpar visual perception remains common issues. The authors propose a method, abbreviated as CFWD, which leverages multimodal visual-language information to guide the enhancement process in the frequency domain, utilizing a unique interplay between wavelet and Fourier transforms.

The CFWD method introduces a fusion of various signal processing techniques and modern neural network paradigms. It employs a wavelet diffusion model, which significantly reduces computational overhead by shifting the diffusion process into the wavelet low-frequency domain. This transformation effectively downsamples the image data, facilitating resource-efficient processing. The method further incorporates a sophisticated hybrid of wavelet and Fourier transforms within a High Frequency Perception Module (HFPM) to capture fine-grained details and ensure coherent image content across different lighting conditions.

One of the paper's key contributions is the integration of the Contrast-Language-Image-Pre-Training (CLIP) model, enabling robust multi-modal semantic alignment. The authors develop a multiscale visual-language guidance network that iteratively enhances image quality. By integrating visual-language prompts within the diffusion process, this network aligns low-light image features with semantic meanings, significantly improving both metric-oriented and perceptual outcomes.

Extensive experiments performed on standard benchmark datasets demonstrate that CFWD outperforms state-of-the-art methods across various metrics, such as PSNR, SSIM, LPIPS, and FID. Particularly notable is the visual quality of the results, highlighted in comparisons against existing techniques. The quantitative assessments indicate substantial progress in both brightness and detail preservation, showcasing CFWD's capability to deliver images with realistic visual appeal.

The implications of CFWD extend beyond just theoretical contributions. Practically, this approach provides a framework that could be adapted for real-world applications such as surveillance, autonomous vehicles, and digital photography, where low-light conditions frequently pose challenges. Theoretically, the amalgamation of wavelet transformation, Fourier analysis, and diffusion models opens new avenues for research on multi-resolution and multi-modal image processing techniques.

The paper points to future work directions, including optimizing the computational efficiency of the proposed method and refining the complexity of the visual-language guidance mechanism. The exploration of adaptive techniques to fine-tune the hybrid frequency domain perception module further remains an intriguing prospect. This research contributes to the broader field of image processing, offering insights into enhancing image quality under challenging lighting circumstances. As advancements in AI and machine learning persist, methods like CFWD may serve as foundational frameworks upon which more sophisticated solutions are built.