Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 72 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance (2312.11396v2)

Published 18 Dec 2023 in cs.CV and cs.AI

Abstract: Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions. However, localized editing in complex scenarios has not been well-studied in the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining the underlying structure within the edit region. Meanwhile, mask-free attention-based methods often exhibit editing leakage and misalignment in more complex compositions. In this work, we develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios. In particular, MAG-Edit optimizes the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints of the edit token, which in turn gradually enhances the local alignment with the desired prompt. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method in achieving both text alignment and structure preservation for localized editing within complex scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Blended diffusion for text-driven editing of natural images. In CVPR, pages 18208–18218, 2022.
  2. Blended latent diffusion. ACM TOG, pages 1–11, 2023.
  3. Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
  4. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In ICCV, pages 22560–22570, 2023.
  5. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. In SIGGRAPH, pages 1–24, 2023.
  6. Training-free layout control with cross-attention guidance. arXiv preprint arXiv:2304.03373, 2023.
  7. The cityscapes dataset for semantic urban scene understanding. In CVPR, pages 3213–3223, 2016.
  8. Diffedit: Diffusion-based semantic image editing with mask guidance. In ICLR, pages 1–22, 2023.
  9. Improving negative-prompt inversion via proximal guidance. arXiv preprint arXiv:2306.05414, 2023.
  10. Prompt-to-prompt image editing with cross attention control. In ICLR, pages 1–36, 2023.
  11. Classifier-free diffusion guidance. In NeruIPS workshop, pages 1–14, 2021.
  12. Pfb-diff: Progressive feature blending diffusion for text-driven image editing. arXiv preprint arXiv:2306.16894, 2023.
  13. Direct inversion: Boosting diffusion-based editing with 3 lines of code. arXiv preprint arXiv:2310.01506, 2023.
  14. Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2023.
  15. Stylediffusion: Prompt-embedding inversion for text-based editing. arXiv preprint arXiv:2303.15649, 2023.
  16. Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
  17. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. arXiv preprint arXiv:2305.16807, 2023.
  18. Null-text inversion for editing real images using guided diffusion models. In CVPR, pages 6038–6047, 2023.
  19. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  20. Zero-shot image-to-image translation. In SIGGRAPH, pages 1–11, 2023.
  21. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  22. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  23. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  24. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
  25. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, pages 36479–36494, 2022.
  26. Denoising diffusion implicit models. In ICLR, pages 1–20, 2021.
  27. Splicing vit features for semantic appearance transfer. In CVPR, pages 10748–10757, 2022.
  28. Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930, 2023.
  29. Instructedit: Improving automatic masks for diffusion-based image editing with user instructions. arXiv preprint arXiv:2305.18047, 2023.
  30. Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. In ICCV, pages 7452–7461, 2023.
  31. Magicbrush: A manually annotated dataset for instruction-guided image editing. In NeurIPS, 2023a.
  32. Sine: Single image editing with text-to-image diffusion models. In CVPR, pages 6027–6037, 2023b.
  33. Scene parsing through ade20k dataset. In CVPR, pages 633–641, 2017.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube