Emergent Mind

Abstract

Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.

Architecture of STAPSM featuring fine-tuning with skin-tone augmentation and a progressive inference phase.

Overview

  • The paper introduces a novel generative AI framework for portrait stylization that is both skin-tone aware and capable of identifying nudity content to improve ethical standards in content generation.

  • It features a skin-tone-aware portrait stylization module (STAPSM) that maintains diverse skin tones through a fine-tuning phase with skin-tone spectrum augmentation and a progressive inference phase.

  • The nudity content identification module (NCIM) utilizes CLIP embedding classifiers and BLIP caption-based keyword matching to filter out explicit content efficiently.

  • The framework has been empirically evaluated, showcasing superior performance in preserving skin tones and filtering nudity, and has seen successful real-world application in generating over 2 million images for Webtoon IPs.

A Framework Combining Portrait Stylization with Skin-Tone Awareness and Nudity Content Identification

Introduction to the Framework

Generative AI, notably through models like Stable Diffusion (SD), has significantly elevated the capabilities in the domain of portrait stylization, transforming input images into distinctive styles whilst aiming to maintain inherent characteristics such as skin tone. However, incorporating effective filters to eliminate harmful content and preserving skin-tone characteristics without compromising stylization quality has been a challenge. This paper introduces a novel framework addressing these concerns by integrating a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). The framework has demonstrated efficiency in retaining a broad spectrum of skin tones and enhancing explicit content filtering, making it suitable for real-world applications.

Core Components of the Framework

Skin-Tone-Aware Portrait Stylization Module (STAPSM)

STAPSM leverages a fine-tuning phase with skin-tone spectrum augmentation and a progressive inference phase, aiming to maintain the input's skin tones while achieving high-quality stylization. The process involves a skin-tone spectrum augmentation that refines the training dataset to ensure a diverse representation of skin tones. It utilizes a two-stage image-to-image (I2I) translation approach, applying different denoising strengths and image conditions, thus preserving both the skin tone and the unique features of various IPs.

Nudity Content Identification Module (NCIM)

NCIM combines the capabilities of the CLIP embedding classifier and BLIP caption-based keyword matching to filter harmful content effectively. By analyzing the biases and limitations in existing nudity filters, NCIM improves reliability in preventing the inadvertent generation or sharing of explicit content. This module represents an advancement in ensuring that generated images in applications align with content and ethical standards.

Empirical Evaluation

The framework demonstrated superior performance in preserving the diverse range of skin tones, significantly outperforming existing methods in qualitative assessments and user studies among professionals in the Webtoon industry. Moreover, the NCIM showcased remarkable accuracy and reliability in identifying and filtering nudity content, highlighting the efficacy of combining embedding-based classifiers with keyword-based matching techniques.

Practical Implications and Future Outlook

The deployment of this framework in a real-world portrait stylization service has generated over 2 million images across several popular Webtoon IPs, receiving positive feedback from users for its skin-tone representation capabilities. The robustness of the NCIM has effectively deterred the generation of explicit content, thereby safeguarding the value of IPs. This study not only addresses existing limitations in portrait stylization technologies but also sets a foundation for future research focused on enhancing generative AI's ethical use and inclusivity.

Conclusion

This paper presents a comprehensive framework that innovatively combines skin-tone-aware portrait stylization with effective nudity content identification. It offers a significant step toward ethical and inclusive generative AI applications, particularly in settings where preserving user characteristics and preventing harmful content generation are critical. The framework's deployment and its success in real-world applications underscore its potential to shape future directions in AI-powered content creation, with an emphasis on ethical considerations and diversity.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.