Correspondence Learning for Controllable Person Image Generation (2012.12440v1)

Published 23 Dec 2020 in cs.CV

Abstract: We present a generative model for controllable person image synthesis,as shown in Figure , which can be applied to pose-guided person image synthesis, $i.e.$, converting the pose of a source person image to the target pose while preserving the texture of that source person image, and clothing-guided person image synthesis, $i.e.$, changing the clothing texture of a source person image to the desired clothing texture. By explicitly establishing the dense correspondence between the target pose and the source image, we can effectively address the misalignment introduced by pose tranfer and generate high-quality images. Specifically, we first generate the target semantic map under the guidence of the target pose, which can provide more accurate pose representation and structural constraints during the generation process. Then, decomposed attribute encoder is used to extract the component features, which not only helps to establish a more accurate dense correspondence, but also realizes the clothing-guided person generation. After that, we will establish a dense correspondence between the target pose and the source image within the sharded domain. The source image feature is warped according to the dense correspondence to flexibly account for deformations. Finally, the network renders image based on the warped source image feature and the target pose. Experimental results show that our method is superior to state-of-the-art methods in pose-guided person generation and its effectiveness in clothing-guided person generation.