Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 131 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 71 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 385 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction (2405.13477v1)

Published 22 May 2024 in cs.HC, cs.SD, and eess.AS

Abstract: With current state-of-the-art automatic speech recognition (ASR) systems, it is not possible to transcribe overlapping speech audio streams separately. Consequently, when these ASR systems are used as part of a social robot like Pepper for interaction with a human, it is common practice to close the robot's microphone while it is talking itself. This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot's ego speech using only a single-channel microphone. This pipeline takes advantage of the possibility to feed the robot ego speech signal, generated by a text-to-speech API, as training data into a machine learning model. The proposed pipeline combines a convolutional neural network and spectral subtraction to extract overlapping human speech from the audio recorded by the robot-embedded microphone. When evaluating on a held-out test set, we find that this pipeline outperforms our previous approach to this task, as well as state-of-the-art target speech extraction systems that were retrained on the same dataset. We have also integrated the proposed pipeline into a lightweight robot software development framework to make it available for broader use. As a step towards demonstrating the feasibility of deploying our pipeline, we use this framework to evaluate the effectiveness of the pipeline in a small lab-based feasibility pilot using the social robot Pepper. Our results show that when participants interrupt the robot, the pipeline can extract the participant's speech from one-second streaming audio buffers received by the robot-embedded single-channel microphone, hence in near-real time.

Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 3 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube