OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on (2403.01779v2)

Published 4 Mar 2024 in cs.CV

Abstract: We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON). We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the garment detail features. Without a redundant warping process, the garment features are precisely aligned with the target human body via the proposed outfitting fusion in the self-attention layers of the denoising UNet. In order to further enhance the controllability, we introduce outfitting dropout to the training process, which enables us to adjust the strength of the garment features through classifier-free guidance. Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating an impressive breakthrough in virtual try-on. Our source code is available at https://github.com/levihsu/OOTDiffusion.

Citations (30)

View on Semantic Scholar

Summary

The paper presents a novel latent diffusion approach that generates realistic virtual try-on images without relying on warping techniques.
It introduces an outfitting UNet and fusion mechanism to integrate garment details with human poses in the latent space for enhanced image quality.
The model outperforms state-of-the-art methods on benchmarks like LPIPS, SSIM, FID, and KID, offering improved controllability and detail preservation.

Outfitting over Try-on Diffusion: Elevating Virtual Try-On with Latent Diffusion Models

Introduction

In the evolving sphere of e-commerce, the demand for advanced virtual try-on (VTON) technologies has surged, aiming to render the digital shopping experience more immersive and personalized. Addressing this, the paper presents Outfitting over Try-on Diffusion (OOTDiffusion), a novel approach designed to harness the potential of pretrained latent diffusion models (LDMs) for generating realistic and controllable virtual try-on images. Distinct from existing methods that primarily rely on warping techniques or GAN architectures, OOTDiffusion introduces an outfitting UNet integrated with an outfitting fusion process. This method strategically merges garment details with the target human images in the latent space, improving the fidelity and detail preservation of the try-on image without the need for an explicit warping process.

Methodology

OOTDiffusion's methodology pivots around three core developments:

Outfitting UNet: A specially designed network that learns and aligns garment features directly in the latent space, eliminating the need for lossy warping processes.
Outfitting Fusion: Facilitates the seamless integration of learned garment features with the target human representation within the denoising process, enhancing the natural fitting of garments over varying body postures.
Outfitting Dropout: Introduced during training to improve the model's control over the strength of garment features in the final output, leveraging classifier-free guidance.

These innovations collectively empower OOTDiffusion to generate high-quality outfitted images that are not only realistic but also retain an exceptional level of garment detail. The model has been rigorously tested on two high-resolution VTON datasets, VITON-HD and Dress Code, showcasing superior performance over contemporary state-of-the-art VTON methods.

Findings

The quantitative and qualitative assessments underscore the efficacy of OOTDiffusion in producing outfitted images that closely align with various human poses while preserving intricate garment details. Notably, the model outperforms existing methods across standard benchmarks including LPIPS, SSIM, FID, and KID metrics, underpinning its capability to generate more realistic and detailed try-on images. The absence of explicit warping not only retains the fidelity of garment textures and patterns but also ensures a more natural integration with the human body. Moreover, the outfitting dropout mechanism effectively balances fidelity and controllability, allowing for adjustable influence of garment features on the outfitted results.

Implications and Future Directions

The advancement presented by OOTDiffusion opens new avenues for the application of latent diffusion models in the virtual try-on domain. The absence of explicit garment warping and the introduction of outfitting fusion highlight a paradigm shift towards more efficient and detail-preserving VTON methodologies. This research not only sets a new benchmark for image-based virtual try-on quality but also lays the groundwork for future explorations into controllable and realistic image synthesis within fashion e-commerce and beyond.

Practically, the integration of such technologies could revolutionize online shopping, providing customers with a more accurate and engaging means to visualize garments. From a theoretical standpoint, the findings encourage further investigation into latent space manipulation and the role of diffusion models in complex image synthesis tasks.

As e-commerce platforms strive to offer more personalized and interactive shopping experiences, the significance of advancements like OOTDiffusion cannot be overstated. Future research may explore extending these methodologies to accommodate a wider range of garments and poses, alongside enhancing the model's generalization capabilities across diverse datasets.

In conclusion, OOTDiffusion heralds a significant step forward in the field of virtual try-on technology, promising more immersive and realistic shopping experiences. Its success in leveraging latent diffusion for high-fidelity and controllable VTON opens the door to numerous potential applications and further innovations in the digital fashion industry.

PDF Markdown

Related Papers

GitHub

GitHub - levihsu/OOTDiffusion: Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on (4,922 stars)

Tweets

https://twitter.com/_akhaliq/status/1765065382812074073

https://twitter.com/goelsukrit/status/1767173709968670987

https://twitter.com/_Sancharika/status/1794447283079073858

YouTube

Show All Videos