Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN (2210.11182v1)

Published 20 Oct 2022 in cs.CV

Abstract: Facial expression generation has always been an intriguing task for scientists and researchers all over the globe. In this context, we present our novel approach for generating videos of the six basic facial expressions. Starting from a single neutral facial image and a label indicating the desired facial expression, we aim to synthesize a video of the given identity performing the specified facial expression. Our approach, referred to as FEV-GAN (Facial Expression Video GAN), is based on Spatio-temporal Convolutional GANs, that are known to model both content and motion in the same network. Previous methods based on such a network have shown a good ability to generate coherent videos with smooth temporal evolution. However, they still suffer from low image quality and low identity preservation capability. In this work, we address this problem by using a generator composed of two image encoders. The first one is pre-trained for facial identity feature extraction and the second for spatial feature extraction. We have qualitatively and quantitatively evaluated our model on two international facial expression benchmark databases: MUG and Oulu-CASIA NIR&VIS. The experimental results analysis demonstrates the effectiveness of our approach in generating videos of the six basic facial expressions while preserving the input identity. The analysis also proves that the use of both identity and spatial features enhances the decoder ability to better preserve the identity and generate high-quality videos. The code and the pre-trained model will soon be made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Deep learning using rectified linear units (relu). CoRR abs/1803.08375. URL: http://arxiv.org/abs/1803.08375, arXiv:1803.08375.
  2. The mug facial expression database, in: 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10, IEEE. pp. 1--4.
  3. Us-gan: On the importance of ultimate skip connection for facial expression synthesis. arXiv preprint arXiv:2112.13002 .
  4. Deep learning for spatio-temporal modeling of dynamic spontaneous emotions. IEEE Transactions on Affective Computing 12, 363--376.
  5. Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Information Sciences 577, 852--870.
  6. Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural networks 145, 233--247.
  7. OpenFace: A general-purpose face recognition library with mobile applications. Technical Report. CMU-CS-16-118, CMU School of Computer Science.
  8. Data-driven dimensional expression generation via encapsulated variational auto-encoders. Cognitive Computation , 1--13.
  9. Openface 2.0: Facial behavior analysis toolkit, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE. pp. 59--66.
  10. Feature-based image metamorphosis. ACM SIGGRAPH computer graphics 26, 35--42.
  11. Vggface2: A dataset for recognising faces across pose and age. CoRR abs/1710.08092. URL: http://arxiv.org/abs/1710.08092, arXiv:1710.08092.
  12. Talking-head generation with rhythmic head motion, in: European Conference on Computer Vision, Springer. pp. 35--51.
  13. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8789--8797.
  14. Exprgan: Facial expression editing with controllable expression intensity. CoRR abs/1709.03842. URL: http://arxiv.org/abs/1709.03842, arXiv:1709.03842.
  15. Controllable image-to-video translation: A case study on facial expression generation, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3510--3517.
  16. Generative adversarial nets, in: Advances in neural information processing systems, pp. 2672--2680.
  17. Ad-nerf: Audio driven neural radiance fields for talking head synthesis, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5784--5794.
  18. Rv-gan: Recurrent gan for unconditional video generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2024--2033.
  19. Marionette: Few-shot face reenactment preserving identity of unseen targets, in: Proceedings of the AAAI conference on artificial intelligence, pp. 10893--10900.
  20. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167. URL: http://arxiv.org/abs/1502.03167, arXiv:1502.03167.
  21. Video prediction with appearance and motion conditions, in: International Conference on Machine Learning, PMLR. pp. 2225--2234.
  22. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics 30, 509--541.
  23. Adam: A method for stochastic optimization. International Conference on Learning Representations .
  24. Auto-encoding variational bayes. arXiv:1312.6114.
  25. Deep neural network augmentation: Generating faces for affect analysis. International Journal of Computer Vision 128, 1455--1484.
  26. Facial landmarks and expression label guided photorealistic facial expression synthesis. IEEE Access 9, 56292--56300.
  27. Expressive talking head generation with granular audio-visual control, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3387--3396.
  28. Adversarial autoencoders. CoRR abs/1511.05644. URL: http://arxiv.org/abs/1511.05644, arXiv:1511.05644.
  29. Dynamic facial expression generation on hilbert hypersphere with conditional wasserstein generative adversarial nets. CoRR abs/1907.10087. URL: http://arxiv.org/abs/1907.10087, arXiv:1907.10087.
  30. Deep face recognition, in: Proceedings of the British Machine Vision Conference (BMVC), BMVA Press. pp. 41.1--41.12. URL: https://dx.doi.org/10.5244/C.29.41, doi:10.5244/C.29.41.
  31. Synthesizing realistic facial expressions from photographs, in: ACM SIGGRAPH 2006 Courses, pp. 19--es.
  32. Ganimation: Anatomically-aware facial animation from a single image, in: Proceedings of the European conference on computer vision (ECCV), pp. 818--833.
  33. Geometry-contrastive generative adversarial network for facial expression synthesis. CoRR abs/1802.01822. URL: http://arxiv.org/abs/1802.01822, arXiv:1802.01822.
  34. Parameterized facial expression synthesis based on mpeg-4. EURASIP Journal on Advances in Signal Processing 2002, 521048.
  35. U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer. pp. 234--241.
  36. A circumplex model of affect. Journal of personality and social psychology 39, 1161.
  37. A fast and accurate facial expression synthesis system for color face images using face graph and deep belief network, in: 2010 International Conference on Electronics and Information Engineering, IEEE. pp. V2--354.
  38. Facenet: A unified embedding for face recognition and clustering. CoRR abs/1503.03832. URL: http://arxiv.org/abs/1503.03832, arXiv:1503.03832.
  39. Emotion-controllable generalized talking face generation. arXiv preprint arXiv:2205.01155 .
  40. Geometry guided adversarial facial expression synthesis, in: Proceedings of the 26th ACM international conference on Multimedia, pp. 627--635.
  41. Eggan: Learning latent space for fine-grained expression manipulation. IEEE MultiMedia 28, 42--51.
  42. A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450--6459.
  43. Image-to-video generation via 3d facial dynamics. IEEE Transactions on Circuits and Systems for Video Technology 32, 1805--1819.
  44. Mocogan: Decomposing motion and content for video generation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1526--1535.
  45. Generating videos with scene dynamics, in: Advances in neural information processing systems, pp. 613--621.
  46. Vdsm: Unsupervised video disentanglement with state-space modeling and deep mixtures of experts, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8176--8186.
  47. Audio2head: Audio-driven one-shot talking-head generation with natural head motion. arXiv preprint arXiv:2107.09293 .
  48. One-shot talking face generation from single-speaker audio-visual correlation learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2531--2539.
  49. Every smile is unique: Landmark-guided diverse smile generation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7083--7092.
  50. Facial expression synthesis by u-net conditional generative adversarial networks, in: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 283--290.
  51. G3an: Disentangling appearance and motion for video generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5264--5273.
  52. Imaginator: Conditional spatio-temporal gan for video generation, in: The IEEE Winter Conference on Applications of Computer Vision, pp. 1160--1169.
  53. Imitating arbitrary talking style for realistic audio-driven talking face synthesis, in: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1478--1486.
  54. Cascade ef-gan: Progressive facial expression editing with local focuses, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5021--5030.
  55. Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853. URL: http://arxiv.org/abs/1505.00853, arXiv:1505.00853.
  56. Facial expression transfer with input-output temporal restricted boltzmann machines, in: Advances in neural information processing systems, pp. 1629--1637.
  57. Facial expression recognition from near-infrared videos. Image and Vision Computing 29, 607--619.
  58. Learning to forecast and refine residual motion for image-to-video generation, in: Proceedings of the European conference on computer vision (ECCV), pp. 387--403.
  59. Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder, in: 2017 seventh international conference on affective computing and intelligent interaction (ACII), IEEE. pp. 370--376.
  60. Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE international conference on computer vision, pp. 2223--2232.
Citations (9)

Summary

We haven't generated a summary for this paper yet.