Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Multimodal Latent Dynamics for Human-Robot Interaction (2311.16380v1)

Published 27 Nov 2023 in cs.RO, cs.HC, and cs.LG

Abstract: This article presents a method for learning well-coordinated Human-Robot Interaction (HRI) from Human-Human Interactions (HHI). We devise a hybrid approach using Hidden Markov Models (HMMs) as the latent space priors for a Variational Autoencoder to model a joint distribution over the interacting agents. We leverage the interaction dynamics learned from HHI to learn HRI and incorporate the conditional generation of robot motions from human observations into the training, thereby predicting more accurate robot trajectories. The generated robot motions are further adapted with Inverse Kinematics to ensure the desired physical proximity with a human, combining the ease of joint space learning and accurate task space reachability. For contact-rich interactions, we modulate the robot's stiffness using HMM segmentation for a compliant interaction. We verify the effectiveness of our approach deployed on a Humanoid robot via a user study. Our method generalizes well to various humans despite being trained on data from just two humans. We find that Users perceive our method as more human-like, timely, and accurate and rank our method with a higher degree of preference over other baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. 3DiVi, “Nuitrack.” [Online]. Available: https://nuitrack.com/
  2. N. Abdulazeem and Y. Hu, “Human factors considerations for quantifiable human states in physical human-robot interaction: A literature review,” Sensors, 2023.
  3. A. Ajoudani, A. M. Zanchettin, S. Ivaldi, A. Albu-Schäffer, K. Kosuge, and O. Khatib, “Progress and prospects of the human–robot collaboration,” Autonomous Robots, 2018.
  4. M. Ammi, V. Demulier, S. Caillou, Y. Gaffary, Y. Tsalamlal, J.-C. Martin, and A. Tapus, “Haptic human-robot affective interaction in a handshaking social protocol,” in Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction, 2015, pp. 263–270.
  5. H. B. Amor, G. Neumann, S. Kamthe, O. Kroemer, and J. Peters, “Interaction primitives for human-robot cooperation tasks,” in IEEE International Conference on Robotics and Automation (ICRA), 2014.
  6. O. Arenz, P. Dahlinger, Z. Ye, M. Volpp, and G. Neumann, “A unified perspective on natural gradient variational inference with gaussian mixture models,” Transactions on Machine Learning Research, 2023.
  7. C. Bartneck, D. Kulić, E. Croft, and S. Zoghbi, “Godspeed questionnaire series,” International Journal of Social Robotics, 2008.
  8. P. Becker, H. Pandya, G. Gebhardt, C. Zhao, C. J. Taylor, and G. Neumann, “Recurrent kalman networks: Factorized inference in high-dimensional deep feature spaces,” in International Conference on Machine Learning (ICML), 2019.
  9. C. M. Bishop, “Mixture density networks,” 1994.
  10. S. Bitzer and S. Vijayakumar, “Latent spaces for dynamic movement primitives,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2009.
  11. A. Bolotnikova, S. Courtois, and A. Kheddar, “Compliant robot motion regulated via proprioceptive sensor based contact observer,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2018.
  12. ——, “Contact observer for humanoid robot pepper based on tracking joint position discrepancies,” in IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2018.
  13. J. Bütepage, A. Ghadirzadeh, Ö. Ö. Karadag, M. Björkman, and D. Kragic, “Imitating by generating: Deep generative models for imitation of interactive tasks,” Frontiers in Robotics and AI, 2020.
  14. S. Calinon, “A tutorial on task-parameterized movement learning and retrieval,” Intelligent Service Robotics, 2016.
  15. S. Calinon, P. Evrard, E. Gribovskaya, A. Billard, and A. Kheddar, “Learning collaborative manipulation tasks by demonstration using a haptic interface,” in International Conference on Advanced Robotics (ICAR), 2009.
  16. J. Campbell and H. B. Amor, “Bayesian interaction primitives: A slam approach to human-robot interaction,” in Conference on Robot Learning (CoRL), 2017.
  17. J. Campbell, A. Hitzmann, S. Stepputtis, S. Ikemoto, K. Hosoda, and H. B. Amor, “Learning interactive behaviors for musculoskeletal robots using bayesian interaction primitives,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
  18. P. Capdepuy, S. Bock, W. Benyaala, and J. Laplace, “Improving human-robot physical interaction with inverse kinematics learning,” in International Conference on Social Robotics (ICSR).   Springer, 2015.
  19. W. F. Chaplin, J. B. Phillips, J. D. Brown, N. R. Clanton, and J. L. Stein, “Handshaking, gender, personality, and first impressions.” Journal of personality and social psychology, 2000.
  20. M. Chaveroche, A. Malaisé, F. Colas, F. Charpillet, and S. Ivaldi, “A variational time series feature extractor for action prediction,” arXiv preprint arXiv:1807.02350, 2018.
  21. N. Chen, J. Bayer, S. Urban, and P. Van Der Smagt, “Efficient movement representation by embedding dynamic movement primitives in deep autoencoders,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2015.
  22. N. Chen, M. Karl, and P. Van Der Smagt, “Dynamic movement primitives in latent space of time-dependent variational autoencoders,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2016.
  23. J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio, “A recurrent latent variable model for sequential data,” in Advances in Neural Information Processing Systems (NeurIPS), 2015.
  24. A. Colomé, G. Neumann, J. Peters, and C. Torras, “Dimensionality reduction for probabilistic movement primitives,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2014.
  25. H. Dai, B. Dai, Y.-M. Zhang, S. Li, and L. Song, “Recurrent hidden semi-markov model,” in International Conference on Learning Representations (ICLR), 2016.
  26. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the royal statistical society: series B (methodological), 1977.
  27. O. Dermy, M. Chaveroche, F. Colas, F. Charpillet, and S. Ivaldi, “Prediction of human whole-body movements with ae-promps,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2018.
  28. P. Evrard, E. Gribovskaya, S. Calinon, A. Billard, and A. Kheddar, “Teaching physical collaborative tasks: object-lifting case study with a humanoid,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2009.
  29. M. Ewerton, G. Neumann, R. Lioutikov, H. B. Amor, J. Peters, and G. Maeda, “Learning multiple collaborative tasks with a mixture of interaction primitives,” in IEEE International Conference on Robotics and Automation (ICRA), 2015.
  30. O. Fabius, J. R. van Amersfoort, and D. P. Kingma, “Variational recurrent auto-encoders,” in ICLR (Workshop), 2015.
  31. L. Fritsche, F. Unverzag, J. Peters, and R. Calandra, “First-person tele-operation of a humanoid robot,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2015.
  32. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics.   JMLR Workshop and Conference Proceedings, 2010.
  33. S. Gomez-Gonzalez, G. Neumann, B. Schölkopf, and J. Peters, “Adaptation and robust learning of probabilistic movement primitives,” IEEE Transactions on Robotics (T-RO), 2020.
  34. J. Han, M. R. Min, L. Han, L. E. Li, and X. Zhang, “Disentangled recurrent wasserstein autoencoder,” in International Conference on Learning Representations (ICLR), 2021.
  35. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” in International Conference on Learning Representations (ICLR), 2016.
  36. N. J. Higham, “Computing a nearest symmetric positive semidefinite matrix,” Linear algebra and its applications, 1988.
  37. E. S. Ho, T. Komura, and C.-L. Tai, “Spatial relationship preserving character motion adaptation,” in ACM Special Interest Group on Computer Graphics (SIGGRAPH), 2010.
  38. K. S. Hone and R. Graham, “Towards a tool for the subjective assessment of speech system interfaces (sassi),” Natural Language Engineering, 2000.
  39. M. Karl, M. Soelch, J. Bayer, and P. Van der Smagt, “Deep variational bayes filters: Unsupervised learning of state space models from raw data,” in International Conference on Learning Representations (ICLR), 2017.
  40. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning Representations (ICLR), 2014.
  41. D. Koert, S. Trick, M. Ewerton, M. Lutter, and J. Peters, “Online learning of an open-ended skill library for collaborative tasks,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2018.
  42. R. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks for nonlinear state space models,” in AAAI Conference on Artificial Intelligence (AAAI), 2017.
  43. S. Krishnan, A. Garg, S. Patil, C. Lea, G. Hager, P. Abbeel, and K. Goldberg, “Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning,” The International Journal of Robotics Research (IJRR), 2017.
  44. O. Kroemer, C. Daniel, G. Neumann, H. Van Hoof, and J. Peters, “Towards learning hierarchical skills for multi-phase manipulation tasks,” in IEEE international conference on robotics and automation (ICRA).   IEEE, 2015.
  45. R. Lioutikov, G. Neumann, G. Maeda, and J. Peters, “Learning movement primitive libraries through probabilistic segmentation,” The International Journal of Robotics Research (IJRR), 2017.
  46. D. Liu, A. Honoré, S. Chatterjee, and L. K. Rasmussen, “Powering hidden markov model by neural network based generative models,” in European Conference on Artificial Intelligence (ECAI).   IOS Press, 2020.
  47. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations (ICLR), 2018.
  48. A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in in ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.
  49. G. Maeda, M. Ewerton, R. Lioutikov, H. B. Amor, J. Peters, and G. Neumann, “Learning interaction for collaborative tasks with probabilistic movement primitives,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2014.
  50. G. Maeda, M. Ewerton, T. Osa, B. Busch, and J. Peters, “Active incremental learning of robot movement primitives,” in Conference on Robot Learning (CoRL), 2017.
  51. P. Manceron, “Ikpy,” 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6551158
  52. K. L. Marsh, M. J. Richardson, and R. C. Schmidt, “Social connection through joint action and interpersonal coordination,” Topics in cognitive science, 2009.
  53. T. Mu and H. Su, “Boosting reinforcement learning and planning with demonstrations: A survey,” arXiv preprint arXiv:2303.13489, 2023.
  54. M. Nagano, T. Nakamura, T. Nagai, D. Mochihashi, I. Kobayashi, and W. Takano, “Hvgh: unsupervised segmentation for high-dimensional time series using deep neural compression and statistical generative model,” Frontiers in Robotics and AI, 2019.
  55. S. Nasiriany, T. Gao, A. Mandlekar, and Y. Zhu, “Learning and retrieval from prior data for skill-based imitation learning,” in Conference on Robot Learning (CoRL), 2023.
  56. E. Ng, Z. Liu, and M. Kennedy III, “Diffusion co-policy for synergistic human-robot collaborative tasks,” arXiv preprint arXiv:2305.12171, 2023.
  57. S. Niekum, S. Osentoski, G. Konidaris, and A. G. Barto, “Learning and generalization of complex tasks from unstructured demonstrations,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012.
  58. O. S. Oguz, W. Rampeltshammer, S. Paillan, and D. Wollherr, “An ontology for human-human interactions and learning interaction behavior policies,” ACM Transactions on Human-Robot Interaction (THRI), 2019.
  59. P. Oikonomou, A. Dometios, M. Khamassi, and C. S. Tzafestas, “Reproduction of human demonstrations with a soft-robotic arm based on a library of learned probabilistic movement primitives,” in IEEE International Conference on Robotics and Automation (ICRA), 2022.
  60. A. K. Pandey and R. Gelin, “A mass-produced sociable humanoid robot: Pepper: The first machine of its kind,” IEEE Robotics & Automation Magazine, 2018.
  61. A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Using probabilistic movement primitives in robotics,” Autonomous Robots, 2018.
  62. A. Paraschos, C. Daniel, J. R. Peters, and G. Neumann, “Probabilistic movement primitives,” in Advances in Neural Information Processing Systems (NeurIPS), 2013.
  63. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems (NeurIPS), 2019.
  64. E. Pignat and S. Calinon, “Learning adaptive dressing assistance from human demonstration,” Robotics and Autonomous Systems (RAS), 2017.
  65. V. Prasad, D. Koert, R. Stock-Homburg, J. Peters, and G. Chalvatzaki, “Mild: Multimodal interactive latent dynamics for learning human-robot interaction,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2022.
  66. V. Prasad, R. Stock-Homburg, and J. Peters, “Learning human-like hand reaching for human-robot handshaking,” in IEEE International Conference on Robotics and Automation (ICRA), 2021.
  67. B. Rammstedt and O. P. John, “Measuring personality in one minute or less: A 10-item short version of the big five inventory in english and german,” Journal of research in Personality, 2007.
  68. D. Rao, F. Sadeghi, L. Hasenclever, M. Wulfmeier, M. Zambelli, G. Vezzani, D. Tirumala, Y. Aytar, J. Merel, N. Heess, et al., “Learning transferable motor skills with hierarchical latent mixture policies,” in International Conference on Learning Representations (ICLR), 2021.
  69. D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in International Conference on Machine Learning (ICML), 2014.
  70. L. Rozo, J. Silverio, S. Calinon, and D. G. Caldwell, “Learning controllers for reactive and proactive behaviors in human–robot collaboration,” Frontiers in Robotics and AI, 2016.
  71. S. Schaal, “Dynamic movement primitives-a framework for motor control in humans and humanoid robotics,” in Adaptive motion of animals and machines, 2006.
  72. N. Sebanz, H. Bekkering, and G. Knoblich, “Joint action: bodies and minds moving together,” Trends in cognitive sciences, 2006.
  73. N. Sebanz and G. Knoblich, “Prediction in joint action: What, when, and where,” Topics in cognitive science, 2009.
  74. F. Semeraro, A. Griffiths, and A. Cangelosi, “Human–robot collaboration and machine learning: A systematic review of recent research,” Robotics and Computer-Integrated Manufacturing, 2023.
  75. T. Shankar and A. Gupta, “Learning robot skills with temporal variational inference,” in International Conference on Machine Learning (ICML), 2020.
  76. T. Shu, X. Gao, M. S. Ryoo, and S.-C. Zhu, “Learning social affordance grammar from videos: Transferring human interactions to human-robot interactions,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
  77. T. Shu, M. S. Ryoo, and S.-C. Zhu, “Learning social affordance for human-robot interaction,” in International Joint Conference on Artificial Intelligence (IJCAI), 2016.
  78. R. Stock-Homburg, “Survey of emotions in human–robot interactions: Perspectives from robotic psychology on 20 years of research,” International Journal of Social Robotics, 2022.
  79. D. S. Syrdal, K. Dautenhahn, K. L. Koay, and M. L. Walters, “The negative attitudes towards robots scale and reactions to robot behaviour in a live human-robot interaction study,” Adaptive and emergent behaviour and complex systems, 2009.
  80. D. Tanneberg, K. Ploeger, E. Rueckert, and J. Peters, “Skid raw: Skill discovery from raw trajectories,” IEEE Robotics and Automation Letters (RA-L), 2021.
  81. M. Tavassoli, S. Katyara, M. Pozzi, N. Deshpande, D. G. Caldwell, and D. Prattichizzo, “Learning skills from demonstrations: A trend from motion primitives to experience abstraction,” IEEE Transactions on Cognitive and Developmental Systems, 2023.
  82. P. Vinayavekhin, M. Tatsubori, D. Kimura, Y. Huang, G. D. Magistris, A. Munawar, and R. Tachibana, “Human-like hand reaching by motion prediction using long short-term memory,” in International Conference on Social Robotics (ICSR).   Springer, 2017.
  83. P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey, İ. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S. . . Contributors, “SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python,” arXiv e-prints, Jul 2019.
  84. D. Vogt, S. Stepputtis, S. Grehl, B. Jung, and H. B. Amor, “A system for learning continuous human-robot interactions from human-human demonstrations,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
  85. C. Wang, C. Pérez-D’Arpino, D. Xu, L. Fei-Fei, K. Liu, and S. Savarese, “Co-gail: Learning diverse strategies for human-robot collaboration,” in Conference on Robot Learning.   PMLR, 2022.
  86. X. Zhao, S. Chumkamon, S. Duan, J. Rojas, and J. Pan, “Collaborative human-robot motion generation using lstm-rnn,” in IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2018.
  87. Y. Zhou, J. Gao, and T. Asfour, “Learning via-point movement primitives with inter-and extrapolation capabilities,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Vignesh Prasad (17 papers)
  2. Lea Heitlinger (1 paper)
  3. Dorothea Koert (9 papers)
  4. Ruth Stock-Homburg (7 papers)
  5. Jan Peters (253 papers)
  6. Georgia Chalvatzaki (44 papers)
Citations (2)

Summary

  • The paper introduces a novel hybrid approach combining HMMs and VAEs to effectively learn and predict interactive motion dynamics.
  • It demonstrates that integrating inverse kinematics fine-tunes robot trajectories in real time to achieve natural human responses.
  • Results from a humanoid robot study reveal enhanced human-likeness, timeliness, and overall preference over baseline methods.

Understanding Human-Robot Interaction Dynamics

Conceptualization

The paper of Human-Robot Interaction (HRI) is crucial for advancing collaborative and assistive robotics. A critical feature of effective HRI is ensuring that robotic movement is synchronized and well-coordinated with human actions. Achieving this involves creating systems that can understand and predict human motion and generate corresponding responsive robot motions, a task often compared to learning from human-to-human interactions (HHI). One such approach combines the insights from HHI with machine learning models, particularly powering the learning process with a latent representation of the interactive dynamics between human and robot.

Methodology

The paper focuses on a hybrid approach that integrates Hidden Markov Models (HMMs) with Variational Autoencoders (VAEs). This is achieved by using HMMs as latent space priors within VAEs to capture joint distributions over interacting agents. Training with this setup, they could predict more accurate robot trajectories as it factors human observations directly into the dynamics learning. However, in real-world applications, these generated motions need fine-tuning to directly interact with humans. This is where Inverse Kinematics (IK) comes into play, adaptively modifying robot trajectories to ensure the robot’s physical proximity and movements are not only theoretically accurate but also practically applicable in human spaces. Additionally, for contact-centric interactions such as handshakes, the robot's motion compliance is modulated accordingly to make it more natural and lifelike.

Experimental Results

The paper reports on the deployment of this approach on a humanoid robot with a user paper aimed at assessing the effectiveness of the trained model. The authors discovered that despite being trained on a limited set of human demonstrations, their model generalizes close to human movements and was preferred over other baseline methods. The hybrid training procedure seemed to have paid dividends, particularly when it comes to the perception of the robot's movements by human partners regarding human likeness, timeliness, accuracy, and preference.

Conclusions

This work demonstrates a step forward in the nuanced field of HRI, showcasing a methodology that lets robots learn from HHI through a blend of statistical and machine learning methods. The success of such an approach is evident in the generation of movement that humans perceive as more natural and cooperative within an interaction scenario. The ability of the system to not only execute learned motions but also adapt these in real-time to the specific human it is interacting with ushers in opportunities for more personalized and intuitive HRI in the future.