Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 214 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Driving Animatronic Robot Facial Expression From Speech (2403.12670v3)

Published 19 Mar 2024 in cs.RO and cs.CV

Abstract: Animatronic robots hold the promise of enabling natural human-robot interaction through lifelike facial expressions. However, generating realistic, speech-synchronized robot expressions poses significant challenges due to the complexities of facial biomechanics and the need for responsive motion synthesis. This paper introduces a novel, skinning-centric approach to drive animatronic robot facial expressions from speech input. At its core, the proposed approach employs linear blend skinning (LBS) as a unifying representation, guiding innovations in both embodiment design and motion synthesis. LBS informs the actuation topology, facilitates human expression retargeting, and enables efficient speech-driven facial motion generation. This approach demonstrates the capability to produce highly realistic facial expressions on an animatronic face in real-time at over 4000 fps on a single Nvidia RTX 4090, significantly advancing robots' ability to replicate nuanced human expressions for natural interaction. To foster further research and development in this field, the code has been made publicly available at: \url{https://github.com/library87/OpenRoboExp}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. T. Fong, I. Nourbakhsh, and K. Dautenhahn, “A survey of socially interactive robots,” Robot. Auton. Syst., vol. 42, pp. 143–166, 2003.
  2. C. Breazeal, K. Dautenhahn, and T. Kanda, “Social robotics,” Springer handbook of robotics, pp. 1935–1972, 2016.
  3. S. Saunderson and G. Nejat, “How robots influence humans: A survey of nonverbal communication in social human–robot interaction,” Int. J. Soc. Robot., vol. 11, pp. 575–608, 2019.
  4. N. Lazzeri, D. Mazzei, M. Ben Moussa, N. Magnenat-Thalmann, and D. De Rossi, “The influence of dynamics and speech on understanding humanoid facial expressions,” IJARS, vol. 15, no. 4, 2018.
  5. J. D. Lomas, A. Lin, S. Dikker, D. Forster, M. L. Lupetti, G. Huisman, J. Habekost, C. Beardow, P. Pandey, N. Ahmad, et al., “Resonance as a design strategy for ai and social robots,” Frontiers in neurorobotics, vol. 16, p. 850489, 2022.
  6. J. Złotowski, D. Proudfoot, K. Yogeeswaran, and C. Bartneck, “Anthropomorphism: opportunities and challenges in human–robot interaction,” Int. J. Soc. Robot., vol. 7, pp. 347–360, 2015.
  7. K. Berns and J. Hirth, “Control of facial expressions of the humanoid robot head roman,” in IROS, pp. 3119–3124, IEEE, 2006.
  8. J.-H. Oh, D. Hanson, W.-S. Kim, Y. Han, J.-Y. Kim, and I.-W. Park, “Design of android type humanoid robot albert hubo,” in IROS, pp. 1428–1433, IEEE, 2006.
  9. T. Hashimoto, S. Hitramatsu, T. Tsuji, and H. Kobayashi, “Development of the face robot saya for rich facial expressions,” in SICE-ICASE Int. Joint Conf., pp. 5423–5428, IEEE, 2006.
  10. D. Mazzei, N. Lazzeri, D. Hanson, and D. De Rossi, “Hefes: An hybrid engine for facial expressions synthesis to control human-like androids and avatars,” in BioRob, pp. 195–200, IEEE, 2012.
  11. C.-Y. Lin, C.-C. Huang, and L.-C. Cheng, “An expressional simplified mechanism in anthropomorphic face robot design,” Robotica, vol. 34, no. 3, pp. 652–670, 2016.
  12. W. T. Asheber, C.-Y. Lin, and S. H. Yen, “Humanoid head face mechanism with expandable facial expressions,” IJARS, vol. 13, no. 1, p. 29, 2016.
  13. Z. Faraj, M. Selamet, C. Morales, P. Torres, M. Hossain, B. Chen, and H. Lipson, “Facially expressive humanoid robotic face,” HardwareX, vol. 9, p. e00117, 2021.
  14. Z. Yan, Y. Song, R. Zhou, L. Wang, Z. Wang, and Z. Dai, “Facial expression realization of humanoid robot head and strain-based anthropomorphic evaluation of robot facial expressions,” Biomimetics, vol. 9, no. 3, p. 122, 2024.
  15. F. Ren and Z. Huang, “Automatic facial expression learning method based on humanoid robot xin-ren,” IEEE Trans. Hum.-Mach. Syst., vol. 46, no. 6, pp. 810–821, 2016.
  16. H.-J. Hyung, H. U. Yoon, D. Choi, D.-Y. Lee, and D.-W. Lee, “Optimizing android facial expressions using genetic algorithms,” Applied Sciences, vol. 9, no. 16, p. 3379, 2019.
  17. B. Chen, Y. Hu, L. Li, S. Cummings, and H. Lipson, “Smile like you mean it: Driving animatronic robotic face with learned models,” in ICRA, pp. 2739–2746, IEEE, 2021.
  18. D. Yang, W. Sato, Q. Liu, T. Minato, S. Namba, and S. Nishida, “Optimizing facial expressions of an android robot effectively: a bayesian optimization approach,” in Humanoids, pp. 542–549, 2022.
  19. B. Tang, R. Cao, R. Chen, X. Chen, B. Hua, and F. Wu, “Automatic generation of robot facial expressions with preferences,” in ICRA, pp. 7606–7613, IEEE, 2023.
  20. U. Zarins, Anatomy of Facial Expression. Exonicus, 2017.
  21. Y. Zhou, X. Han, E. Shechtman, J. Echevarria, E. Kalogerakis, and D. Li, “Makelttalk: speaker-aware talking-head animation,” TOG, vol. 39, no. 6, pp. 1–15, 2020.
  22. B. Liang, Y. Pan, Z. Guo, H. Zhou, Z. Hong, X. Han, J. Han, J. Liu, E. Ding, and J. Wang, “Expressive talking head generation with granular audio-visual control,” in CVPR, pp. 3387–3396, 2022.
  23. W. Zhang, X. Cun, X. Wang, Y. Zhang, X. Shen, Y. Guo, Y. Shan, and F. Wang, “Sadtalker: Learning realistic 3d motion coefficients for stylized audio-driven single image talking face animation,” in CVPR, pp. 8652–8661, 2023.
  24. J. Wang, K. Zhao, Y. Ma, S. Zhang, Y. Zhang, Y. Shen, D. Zhao, and J. Zhou, “Facecomposer: A unified model for versatile facial content creation,” NIPS, vol. 36, 2024.
  25. T. Karras, T. Aila, S. Laine, A. Herva, and J. Lehtinen, “Audio-driven facial animation by joint end-to-end learning of pose and emotion,” TOG, vol. 36, no. 4, pp. 1–12, 2017.
  26. D. Cudeiro, T. Bolkart, C. Laidlaw, A. Ranjan, and M. J. Black, “Capture, learning, and synthesis of 3d speaking styles,” in CVPR, pp. 10101–10111, 2019.
  27. A. Richard, M. Zollhöfer, Y. Wen, F. De la Torre, and Y. Sheikh, “Meshtalk: 3d face animation from speech using cross-modality disentanglement,” in ICCV, pp. 1173–1182, 2021.
  28. Y. Fan, Z. Lin, J. Saito, W. Wang, and T. Komura, “Faceformer: Speech-driven 3d facial animation with transformers,” in CVPR, pp. 18770–18780, 2022.
  29. R. Daněček, K. Chhatre, S. Tripathi, Y. Wen, M. Black, and T. Bolkart, “Emotional speech-driven animation with content-emotion disentanglement,” in SIGGRAPH Asia, pp. 1–13, 2023.
  30. J. P. Lewis, K. Anjyo, T. Rhee, M. Zhang, F. H. Pighin, and Z. Deng, “Practice and theory of blendshape facial models.,” Eurographics, vol. 1, no. 8, p. 2, 2014.
  31. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” vol. 33, pp. 12449–12460, 2020.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com