Texture Networks: Feed-forward Synthesis of Textures and Stylized Images (1603.03417v1)

Published 10 Mar 2016 in cs.CV

Abstract: Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods requires a slow and memory-consuming optimization process. We propose here an alternative approach that moves the computational burden to a learning stage. Given a single example of a texture, our approach trains compact feed-forward convolutional networks to generate multiple samples of the same texture of arbitrary size and to transfer artistic style from a given image to any other image. The resulting networks are remarkably light-weight and can generate textures of quality comparable to Gatys~et~al., but hundreds of times faster. More generally, our approach highlights the power and flexibility of generative feed-forward models trained with complex and expressive loss functions.

Citations (915)

View on Semantic Scholar

Summary

The paper presents a feed-forward network that reduces processing time by 100x compared to optimization-based methods.
It employs a multi-scale generative architecture using convolutional layers, upsampling, and noise concatenation to synthesize textures at various resolutions.
By integrating texture and content loss functions from VGG-19, the approach maintains high perceptual quality in both textures and styled images.

"Texture Networks: Feed-forward Synthesis of Textures and Stylized Images" (Texture Networks: Feed-forward Synthesis of Textures and Stylized Images, 2016) introduces a feed-forward approach to texture synthesis and style transfer, offering a computationally efficient alternative to the optimization-based method proposed by Gatys et al.

Core Contributions

The paper addresses the computational inefficiencies of existing texture synthesis methods by introducing compact, feed-forward convolutional networks capable of generating high-quality textures and transferring artistic styles in a single pass. The key contributions are as follows:

Efficient Texture Synthesis: The introduction of "texture networks" enables the synthesis of textures with comparable quality to Gatys et al. but with a significant speedup, achieving a two-orders-of-magnitude reduction in processing time.
Multi-scale Generative Architecture: The architecture leverages a multi-scale approach, utilizing convolutional layers, upsampling, and noise concatenation to synthesize textures with varying complexities and resolutions. This design facilitates matching the statistical properties of a given texture example.
Style Transfer: By integrating texture and content losses into a hybrid loss function, the generative model extends to style transfer, allowing for the transformation of an image's style while preserving its content across multiple network layers.

Technical Deep Dive

The texture networks employ a feed-forward generative network trained with complex, perceptually-motivated loss functions derived from pre-trained CNNs, specifically VGG-19 layers. The model matches the statistical features of a desired texture through Gram matrices, capturing spatial correlations of feature maps across multiple layers.

Loss Functions

The loss function is critical to the performance of the texture networks. It is composed of two main components:

Texture Loss: This loss ensures that the synthesized texture matches the statistical properties of the target texture. It is computed by comparing the Gram matrices of the feature maps from the synthesized texture and the target texture across multiple layers of the VGG-19 network. The Gram matrix, $G_{ij}^l$ , is defined as:

$G_{ij}^l = \sum_k F_{ik}^l F_{jk}^l$

where $F_{ik}^l$ represents the activation of the $i$ -th filter at position $k$ in layer $l$ .
Content Loss: In the context of style transfer, the content loss ensures that the stylized image retains the high-level content of the original image. This loss is typically computed by comparing the feature maps of the stylized image and the content image in one or more layers of the VGG-19 network.

The overall loss function, $\mathcal{L}$ , can be expressed as a weighted sum of the texture loss ( $\mathcal{L}_{texture}$ ) and the content loss ( $\mathcal{L}_{content}$ ):

$\mathcal{L} = \alpha \mathcal{L}_{texture} + \beta \mathcal{L}_{content}$

where $\alpha$ and $\beta$ are weighting factors that control the relative importance of the texture and content losses.

Network Architecture

The network architecture consists of convolutional layers interspersed with upsampling and noise concatenation operations. The use of noise injection helps to introduce stochasticity into the generated textures, preventing the network from simply memorizing the input texture. The multi-scale approach allows the network to capture both fine-grained and coarse-grained texture features.

Empirical Performance

The paper demonstrates empirical results, achieving competitive texture synthesis while significantly reducing computational costs. The method facilitates real-time applications, including video processing and mobile implementations. Qualitative comparisons and computational efficiency analysis validate the perceptual quality and diversity of the generated textures and styled images.

Quantitative Analysis

While the paper primarily focuses on qualitative results, it also provides a quantitative analysis of the computational efficiency of the proposed method. The texture networks achieve a speedup of two orders of magnitude compared to the optimization-based method of Gatys et al. This significant improvement in computational efficiency makes the proposed method more practical for real-world applications.

Qualitative Analysis

The paper includes a comprehensive set of qualitative comparisons to demonstrate the performance of the texture networks. The generated textures and styled images exhibit high perceptual quality and diversity. The results show that the proposed method is capable of handling a wide range of textures and styles.

Limitations and Future Research

Despite the advancements, certain styles pose challenges, with the optimization-based method of Gatys et al. showing superior performance in some instances. Future research could refine loss functions or explore deeper network architectures to address this gap. Integration of more complex constraints and loss functions derived from perceptual or semantic priors could further expand the utility of these networks in computer vision and artistic image processing.

In summary, "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images" (Texture Networks: Feed-forward Synthesis of Textures and Stylized Images, 2016) presents a significant advancement in texture synthesis and style transfer by using feed-forward neural networks with sophisticated loss functions. This approach achieves practical computational efficiency and sets the stage for future enhancements in generative visual models.

PDF Markdown