Emergent Mind

DemoFusion: Democratising High-Resolution Image Generation With No $$$

(2311.16973)
Published Nov 24, 2023 in cs.CV , cs.AI , and cs.LG

Abstract

High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration.

DemoFusion framework enhances resolution via "upsample-diffuse-denoise" loop and dilated sampling for coherent content generation.

Overview

  • DemoFusion is a new framework that enhances the resolution capabilities of Latent Diffusion Models (LDMs) for generating high-resolution images without costly training.

  • The framework uses three innovative techniques—Progressive Upscaling, Skip Residual, and Dilated Sampling—to improve image resolution up to 4096 pixels while maintaining semantic integrity.

  • DemoFusion operates on accessible 'working class' GPUs like the RTX 3090, extending the reach of high-resolution image generation to a wider audience.

  • While the framework allows for low-resolution previews for quick iterations, generating high-resolution images can require increased runtime.

  • DemoFusion has practical applications in various fields including digital art and updating images, despite potential limitations such as producing some incoherent content.

Democratizing High-Resolution Image Generation with DemoFusion

In the rapidly evolving field of Generative Artificial Intelligence (GenAI), creating high-resolution imagery has been an area of immense interest and potential. Unfortunately, the computational requirements to train models capable of such feats are substantial, often leaving individuals and academic institutions lagging behind large corporations that can afford such investments. Consequently, open-source GenAI models for high-resolution image generation have become a privilege rather than a widely accessible resource.

Enter DemoFusion—a groundbreaking framework that addresses this imbalance. DemoFusion is designed to extend the capabilities of existing Latent Diffusion Models (LDMs) without necessitating further expensive training or excessive memory. What makes this framework remarkable is its ability to generate images exceeding 4096 pixels in resolution, which is a significant leap from previous models like SDXL that capped at 1024 pixels. This capability is harnessed using a standard RTX 3090 GPU—a 'working class' hardware in the realm of GenAI—making it accessible to a broader user base.

Central to DemoFusion are three pioneering techniques: Progressive Upscaling, Skip Residual, and Dilated Sampling. Progressive Upscaling works iteratively, refining the image multiple times, enhancing resolution with each pass. Meanwhile, Skip Residual uses intermediate, noise-inverted images from the previous resolution as guides to maintain global consistency and semantic coherency at higher resolutions. Dilated Sampling ensures that even as details are added, they intuitively fit within the broader context of the global image.

Despite these advantages, DemoFusion is not without trade-offs. The primary one is the increased runtime required to generate high-resolution images. This can be mitigated by the framework’s ability to provide low-resolution ‘previews’ swiftly, allowing users to iterate prompts quickly before committing to the full high-resolution rendering process.

The practical applications of DemoFusion are promising, demonstrated in various scenarios ranging from enhancing realism in digital art to updating older images to match modern display resolutions. It does, however, have limitations that are important to address. DemoFusion can occasionally produce localized irrational content or small repetitive elements in certain contexts—hurdles that future iterations and updates could potentially overcome.

In summary, DemoFusion is a significant step toward democratizing high-resolution image generation, aligning perfectly with the ethos of making sophisticated GenAI technologies more available to all. Through seamless plug-and-play compatibility with existing open-source models, DemoFusion unlocks the untapped potential of LDMs, amplifying their resolution capabilities while balancing memory demands and computational efficiency. As such, it represents a notable advance in the field, paving the way for more equitable access and innovation in high-resolution image synthesis.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube