Emergent Mind

Abstract

Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets and the likelihood of the preferred sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety and general preference alignment when used with Pick-a-Pic v2, surpassing the base SDXL and other existing methods. Our code, models, and datasets are publicly available via https://mapo-t2i.github.io

Margin-aware preference optimization (MaPO) using style-specific prompts and self-curated offline preference data.

Overview

  • The paper introduces Margin-aware Preference Optimization (MaPO), a novel technique to align text-to-image diffusion models with human preferences without relying on reference models, addressing the issue of 'reference mismatch.'

  • MaPO maximizes the likelihood margin between preferred and dispreferred images to instill stylistic preferences and ensure better alignment with human feedback, validated through new datasets Pick-Style and Pick-Safety.

  • The paper showcases MaPO's effectiveness through substantial improvements in alignment, training speed, and memory efficiency, positioning it as a powerful tool for personalized and safer generative applications.

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

The paper "Margin-aware Preference Optimization for Aligning Diffusion Models without Reference" by Jiwoo Hong et al. addresses the significant challenges and limitations of using reference models in preference optimization of diffusion models. It introduces a novel technique named margin-aware preference optimization (MaPO) that seeks to mitigate the issues arising from "reference mismatch."

Context and Motivation

Text-to-image diffusion models have demonstrated substantial capabilities in generating detailed and high-quality images, often conditioned on textual prompts. However, aligning these models to better match human preferences remains a challenging task, especially when divergences exist between the preference datasets and the reference models, an issue termed as "reference mismatch." Traditional methods such as reinforcement learning from human feedback (RLHF) and Direct Preference Optimization (DPO) rely heavily on divergence regularization relative to a reference model, which can limit the flexibility during alignment, reducing the model's ability to adapt to human preferences efficiently.

Key Contributions and Methodology

Reference Mismatch Analysis

The paper identifies two primary scenarios of reference mismatch:

  1. Reference-chosen mismatch: Occurs when the reference model's representations of the chosen images significantly differ from the dataset.
  2. Reference-rejected mismatch: Occurs when rejected images are far apart in the reference model, leading to trivial optimization gains.

Introduction of MaPO

Motivated by these observations, MaPO is conceptualized to function without dependence on any reference model. The method operates by jointly maximizing the likelihood margin between the preferred and dispreferred images alongside the preferred set's likelihood. This approach allows MaPO to instill general stylistic preferences while aligning closely to human preferences.

Experimental Evaluation

To validate MaPO's effectiveness, two new pairwise preference datasets were introduced:

  1. Pick-Style: Comprising image pairs focusing on style preferences such as cartoons and pixel art.
  2. Pick-Safety: Comprising image pairs with safety considerations like avoiding explicit content.

The paper reports substantial improvements in alignment using MaPO on both datasets. Notably:

  • In experiments using Pick-Style, MaPO showed a win rate of at least 73% against existing methods and explicit style prompting strategies.
  • In the Pick-Safety dataset, MaPO consistently outperformed other methods in generating safer images even when facing adversarial prompts.

Furthermore, when evaluated with the general preference dataset, Pick-a-Pic v2, MaPO-s fine-tuned model outperformed state-of-the-art text-to-image diffusion models across multiple public benchmarks, including achieving a significant 6.17 score in aesthetics and being ranked highly by Imgsys anonymous user preferences.

Theoretical and Practical Implications

Theoretical Advancements

The introduction of MaPO emphasizes the implications of reference mismatches in training procedures and proposes a fundamentally different approach. By eliminating reliance on reference models, it enhances the model's adaptability and learning efficiency even with limited and significantly diverse preference data.

Practical Impacts

Practically, MaPO showcases improvements in memory efficiency and training speed. The paper demonstrates MaPO reduces training time by 14.5% and memory consumption by 17.5%, providing a strong case for its application in real-world generative tasks. This efficiency, coupled with improved alignment and aesthetic scores, positions MaPO as a powerful tool for personalized and safer generative applications.

Future Directions

The development of MaPO opens several avenues for future research. These include:

  • Exploring MaPO in other generative settings beyond text-to-image models.
  • Investigating the scalability of MaPO for larger and more complex datasets.
  • Integrating MaPO with real-time user feedback systems to enhance personalization.
  • Extending the theoretical framework to better understand the dynamics of margin-aware optimization in various generative contexts.

Conclusion

In conclusion, the paper presents a significant advancement in preference optimization methodologies for diffusion models. By situating MaPO within the identified challenges of reference mismatches, the authors provide a robust and efficient pathway for aligning modern diffusion models more closely with human preferences. The methodological rigor and substantial empirical results make MaPO a valuable contribution to the field of generative modeling and AI alignment.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.