Papers
Topics
Authors
Recent
2000 character limit reached

Improving Diffusion Models for Authentic Virtual Try-on in the Wild (2403.05139v3)

Published 8 Mar 2024 in cs.CV

Abstract: This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined IDM-VTON, uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario. More visualizations are available in our project page: https://idm-vton.github.io

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Citations (7)

Summary

  • The paper introduces IDM–VTON, a novel diffusion model architecture that leverages dual attention modules to maintain high garment fidelity.
  • It integrates TryonNet, IP-Adapter, and GarmentNet, achieving superior LPIPS, SSIM, and FID scores compared to previous methods.
  • The study demonstrates robust performance in complex poses and backgrounds, offering promising applications in online retail and virtual reality.

Enhancements in Diffusion Models for Virtual Try-On

Introduction

The paper "Improving Diffusion Models for Authentic Virtual Try-On in the Wild" explores an innovative approach to virtual try-on applications using diffusion models. It addresses a significant challenge in e-commerce and fashion technology: generating realistic images of a person wearing a given garment from just two images—one of the person and another of the garment. Despite advances in generative models, existing techniques often compromise garment identity or image authenticity, issues that this research aims to mitigate through a novel model architecture called IDM--VTON.

Methodology

The IDM--VTON model leverages a unique architecture that improves upon traditional diffusion models by incorporating dual attention modules designed to enhance garment fidelity and realism in try-on images. This model inputs images of a person and a garment and processes these through distinct pathways to maintain high fidelity of garment features while ensuring the natural appearance of try-on images.

The core architecture consists of three main components:

  1. TryonNet: A base UNet responsible for processing the person image, enhanced with additional inputs like a segmentation mask and pose information.
  2. IP-Adapter: An image prompt adapter that encodes high-level semantic information from the garment image using a pretrained CLIP model, contributing to maintaining garment identity.
  3. GarmentNet: A specialized UNet focusing on capturing fine-grained details, such as textures and patterns, of the garment image, which are integrated into the TryonNet's processing pipeline through a self-attention mechanism. Figure 1

    Figure 1: Overview of IDM--VTON, highlighting the architecture with key components like TryonNet, IP-Adapter, and GarmentNet.

Qualitative and Quantitative Evaluation

The research presented comprehensive qualitative and quantitative evaluations across several datasets. It demonstrated superior performance over prior methods in maintaining garment detail and creating realistic composite images. Notably, IDM--VTON excelled in scenarios involving diverse poses and intricate backgrounds, as evidenced by robust results on the challenging In-the-Wild dataset.

Quantitative metrics highlighted include:

  • LPIPS and SSIM: For assessing perceptual similarity and structural fidelity, IDM--VTON significantly outperformed GAN-based and previous diffusion-based methods.
  • FID Scores: Indicating high fidelity and realism of the generated images, IDM--VTON reported lower FID scores demonstrating superior quality. Figure 2

    Figure 2: Comparisons between datasets used, notably emphasizing the In-the-Wild dataset's complexity.

    Figure 3

    Figure 3: Qualitative results on VITON-HD and DressCode dataset show the enhanced detail and consistency IDM--VTON maintains.

Customization and Adaptation

The paper also explored customization techniques to enhance model adaptability to unseen scenarios, employing fine-tuning strategies with high efficacy. This approach allowed IDM--VTON to dynamically adjust to specific garment-person image configurations, ensuring image fidelity in varied contexts.

Implications and Future Work

The paper significantly contributes to the field by demonstrating that detailed garment information and adaptive architectures in diffusion models can produce higher-quality virtual try-on results than existing methods. The methodology has applications beyond fashion, including virtual reality and online retail, where personalized and accurate visual representations are crucial.

The exploration of detailed textual descriptions for garments, alongside image data, hints at future integrations of multimodal data sources for improved model performance. The potential integration of real-time try-on capabilities and further refinement in image realism through neural training could herald transformative changes in digital retail experiences.

Conclusion

In conclusion, the IDM--VTON model marks a pivotal advancement in virtual try-on technology through its architectural innovations in diffusion models, demonstrating enhanced capabilities in creating authentic images that maintain garment identity and wearer realism. This paper lays groundwork for future research into adaptive generative models and their application across technology and commerce sectors.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 7 tweets with 145 likes about this paper.